# Distributional Reinforcement Learning in the Brain

> **NIH NIH R01** · HARVARD UNIVERSITY · 2024 · $523,057

## Abstract

Project Summary
The field of artificial intelligence (AI) has recently made remarkable advances that resulted in new and
improved algorithms and network architectures that proved efficient empirically in silico. These
advances raise new questions in neurobiology: are these new algorithms used in the brain? The
present study focuses on a new algorithm developed in the field of reinforcement learning (RL), called
distributional RL, which outperforms other state-of-the-art RL algorithms and is regarded as a major
advancement in RL. In environments in which rewards are probabilistic with respect to its occurrence
and size, traditional RL algorithms have focused on learning to predict a single quantity, the average
over all potential rewards. Distributional RL, by contrast, learns to predict the entire distribution over
rewards (or values) by employing multiple value predictors that together encode all possible levels of
future reward concurrently. Remarkably, theoretical work has shown that a class of distributional RL,
called ‘quantile distributional RL’, can arise out of a simple modification of traditional RL that
introduces structured variability in dopamine reward prediction error (RPE) signals.
This project set out to test the hypothesis that the brain utilizes distributional RL to predict future
rewards. Aim 1 will explore the characteristics of distributional RL theoretically and make predictions
that allow for testing distributional RL in the brain. Theoretical investigations and simulations will be
used to determine how value representations in distributional RL differ from pre-existing population
coding schemes for representing probability distributions (probabilistic population codes, distributed
distributional codes, etc.). Aim 2 will examine the activity of neurons that are thought to signal RPEs
and reward expectation and test various predictions of distributional RL. Specifically, the activity of
dopamine neurons in the ventral tegmental area and neurons in the ventral striatum and orbitofrontal
cortex will be compared to key predictions of distributional RL. Aim 3 will use optogenetic
manipulation to causally demonstrate the relationship between RPE signals and distributional codes.

## Key facts

- **NIH application ID:** 10837742
- **Project number:** 5R01NS116753-03
- **Recipient organization:** HARVARD UNIVERSITY
- **Principal Investigator:** Jan Drugowitsch
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $523,057
- **Award type:** 5
- **Project period:** 2020-04-15 → 2026-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10837742

## Citation

> US National Institutes of Health, RePORTER application 10837742, Distributional Reinforcement Learning in the Brain (5R01NS116753-03). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10837742. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*