New paper: An opponent striatal circuit for distributional reinforcement learning
Adam Lowet’s PhD work on An opponent striatal circuit for distributional reinforcement learning was just published in Nature. In this collaboration with Melissa Meng and Sara Matias from Naoshige Uchida’s lab (Harvard MCB) and Qiao Zheng from this lab, we show how striatal circuits might use distributional reinforcement learning to learn distributions over expected rewards.
Standard theories for how we learn to make decisions from rewarded feedback assume these decisions are only guided by average rewards, even if the rewards themselves are stochastic. This stands in contrast to recent machine learning advances, called distributional reinforcement learning, that have achieved state-of-the-art performance by learning whole reward distributions rather than just their averages. Furthermore, recent work by Google Deepmind in collaboration with Naoshige Uchida’s lab has revealed that the activity of dopamine neurons in the ventral tegmental area (VTA) in mice - neurons that are involved in learning reward expectations - supports the idea that our brains also learn such reward distributions.
Together with Naoshige Uchida’s lab we set out to see if we could identify brain areas in mouse striatum that store the learned reward distributions. Spearheaded by Adam Lowet, a (then) graduate student in the Uchida lab, and supported by Qiao Zhen (Drugowitsch lab) and Melissa Meng and Sata Matias (Uchida lab), we could indeed identify neural populations that seemed to represent these distributions. Furthermore, we updated distributional reinforcement algorithms to match known striatal circuit structures, and confirmed the resulting model by perturbing the activity of these populations.
For further details please consult Adam’s paper.
You can find easier-to-digest summaries/announcements at