Distributional reinforcement learning in the brain.

NIH RePORTER · NIH · R01 · $1,784,522 · view on reporter.nih.gov ↗

Abstract

Project Summary The field of artificial intelligence (AI) has recently made remarkable advances that resulted in new and improved algorithms and network architectures that proved efficient empirically in silico. These advances raise new questions in neurobiology: are these new algorithms used in the brain? The present study focuses on a new algorithm developed in the field of reinforcement learning (RL), called distributional RL, which outperforms other state-of-the-art RL algorithms and is regarded as a major advancement in RL. In environments in which rewards are probabilistic with respect to its occurrence and size, traditional RL algorithms have focused on learning to predict a single quantity, the average over all potential rewards. Distributional RL, by contrast, learns to predict the entire distribution over rewards (or values) by employing multiple value predictors that together encode all possible levels of future reward concurrently. Remarkably, theoretical work has shown that a class of distributional RL, called ‘quantile distributional RL’, can arise out of a simple modification of traditional RL that introduces structured variability in dopamine reward prediction error (RPE) signals. This project set out to test the hypothesis that the brain utilizes distributional RL to predict future rewards. Aim 1 will explore the characteristics of distributional RL theoretically and make predictions that allow for testing distributional RL in the brain. Theoretical investigations and simulations will be used to determine how value representations in distributional RL differ from pre-existing population coding schemes for representing probability distributions (probabilistic population codes, distributed distributional codes, etc.). Aim 2 will examine the activity of neurons that are thought to signal RPEs and reward expectation and test various predictions of distributional RL. Specifically, the activity of dopamine neurons in the ventral tegmental area and neurons in the ventral striatum and orbitofrontal cortex will be compared to key predictions of distributional RL. Aim 3 will use optogenetic manipulation to causally demonstrate the relationship between RPE signals and distributional codes.

Key facts

NIH application ID: 9978224
Project number: 1R01NS116753-01
Recipient: HARVARD UNIVERSITY
Principal Investigator: Jan Drugowitsch
Activity code: R01
Funding institute: NIH
Fiscal year: 2020
Award amount: $1,784,522
Award type: 1
Project period: 2020-04-15 → 2023-05-31