# Distributional value coding and reinforcement learning in the brain

> **NIH NIH F31** · HARVARD MEDICAL SCHOOL · 2022 · $34,515

## Abstract

ABSTRACT
Making predictions about future rewards in the environment, and taking actions to obtain those rewards, is critical
for survival. When these predictions are overly optimistic — for example, in the case of gambling addiction — or
overly pessimistic — as in anxiety and depression — maladaptive behavior can result and present a significant
disease burden. A fundamental challenge for making reward predictions is that the world is inherently stochastic,
and events on the tails of a distribution need not reflect the average. Therefore, it may be useful to predict not
only the mean, but also the complete probability distribution of upcoming rewards. Indeed, recent advances in
machine learning have demonstrated that making this shift from the average reward to the complete reward
distribution can dramatically improve performance in complex task domains. Despite its apparent complexity,
such “distributional reinforcement learning” can be achieved computationally with a remarkably simple and
biologically plausible learning rule. A recent study found that the structure of dopamine neuron activity may be
consistent with distributional reinforcement learning, but it is unknown whether additional neuronal circuity is
involved — most notably the ventral striatum (VS) and orbitofrontal cortex (OFC), both of which receive dopamine
input and are thought to represent anticipated reward, also called “value”. Here, we propose to investigate
whether value coding in these downstream regions is consistent with distributional reinforcement learning. In
particular, we will record from these brain regions while mice perform classical conditioning with odors and water
rewards. In the first task, we will hold the mean reward constant while changing the reward variance or higher-
order moments, and ask whether neurons in the VS and OFC represent information over and above the mean,
consistent with distributional reinforcement learning. In principle, this should enable us to decode the complete
reward distribution purely from neural activity. In the second task, we will present mice with a panel of odors
predicting the same reward amount with differing probabilities. The simplicity of these Bernoulli distributions will
allow us to compare longstanding theories of population coding in the brain — that is, how probability distributions
can be instantiated in neural activity to guide behavior. In addition to high-density silicon probe recordings, we
will perform two-photon calcium imaging in these tasks to assess whether genetically and molecularly distinct
subpopulations of neurons in the striatum contribute differentially to distributional reinforcement learning. Finally,
we will combine these recordings with simultaneous imaging of dopamine dynamics in the striatum to ask how
dopamine affects striatal activity in vivo. Together, these studies will help clarify dopamine’s role in learning
distributions of reward, as well as its dysregulation in addiction, anxiety, depr...

## Key facts

- **NIH application ID:** 10539251
- **Project number:** 5F31NS124095-02
- **Recipient organization:** HARVARD MEDICAL SCHOOL
- **Principal Investigator:** Adam Stanley Lowet
- **Activity code:** F31 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $34,515
- **Award type:** 5
- **Project period:** 2021-08-01 → 2024-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10539251

## Citation

> US National Institutes of Health, RePORTER application 10539251, Distributional value coding and reinforcement learning in the brain (5F31NS124095-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10539251. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*