Distributional value coding and reinforcement learning in the brain

NIH RePORTER · NIH · F31 · $34,515 · view on reporter.nih.gov ↗

Abstract

ABSTRACT Making predictions about future rewards in the environment, and taking actions to obtain those rewards, is critical for survival. When these predictions are overly optimistic — for example, in the case of gambling addiction — or overly pessimistic — as in anxiety and depression — maladaptive behavior can result and present a significant disease burden. A fundamental challenge for making reward predictions is that the world is inherently stochastic, and events on the tails of a distribution need not reflect the average. Therefore, it may be useful to predict not only the mean, but also the complete probability distribution of upcoming rewards. Indeed, recent advances in machine learning have demonstrated that making this shift from the average reward to the complete reward distribution can dramatically improve performance in complex task domains. Despite its apparent complexity, such “distributional reinforcement learning” can be achieved computationally with a remarkably simple and biologically plausible learning rule. A recent study found that the structure of dopamine neuron activity may be consistent with distributional reinforcement learning, but it is unknown whether additional neuronal circuity is involved — most notably the ventral striatum (VS) and orbitofrontal cortex (OFC), both of which receive dopamine input and are thought to represent anticipated reward, also called “value”. Here, we propose to investigate whether value coding in these downstream regions is consistent with distributional reinforcement learning. In particular, we will record from these brain regions while mice perform classical conditioning with odors and water rewards. In the first task, we will hold the mean reward constant while changing the reward variance or higher- order moments, and ask whether neurons in the VS and OFC represent information over and above the mean, consistent with distributional reinforcement learning. In principle, this should enable us to decode the complete reward distribution purely from neural activity. In the second task, we will present mice with a panel of odors predicting the same reward amount with differing probabilities. The simplicity of these Bernoulli distributions will allow us to compare longstanding theories of population coding in the brain — that is, how probability distributions can be instantiated in neural activity to guide behavior. In addition to high-density silicon probe recordings, we will perform two-photon calcium imaging in these tasks to assess whether genetically and molecularly distinct subpopulations of neurons in the striatum contribute differentially to distributional reinforcement learning. Finally, we will combine these recordings with simultaneous imaging of dopamine dynamics in the striatum to ask how dopamine affects striatal activity in vivo. Together, these studies will help clarify dopamine’s role in learning distributions of reward, as well as its dysregulation in addiction, anxiety, depr...

Key facts

NIH application ID: 10539251
Project number: 5F31NS124095-02
Recipient: HARVARD MEDICAL SCHOOL
Principal Investigator: Adam Stanley Lowet
Activity code: F31
Funding institute: NIH
Fiscal year: 2022
Award amount: $34,515
Award type: 5
Project period: 2021-08-01 → 2024-07-31