A theoretical framework for probabilistic reinforcement learning in the basal ganglia

NIH RePORTER · NIH · U19 · $497,846 · view on reporter.nih.gov ↗

Abstract

Project abstract According to the standard reinforcement learning framework, the basal ganglia implements estimation of long- term future reward and the control of actions to maximize future reward. Dopamine (DA) plays a central role by providing the learning signal (reward prediction error, or RPE) that guides updating of reward predictions and the action policy. Despite its success, the reinforcement learning framework has been challenged from a number of directions. Some studies have suggested that DA encodes reward predictions themselves, rather than reward prediction errors, and other studies have suggested that DA may play a role in invigorating action selection independently from its contribution to learning. A major goal of this project is to develop a reinforcement learning theory of basal ganglia function that addresses these challenges, and more broadly presents a unifying view of how learning, probabilistic inference, and action selection work together to produce adaptive behavior. Our theoretical innovation can be divided into three components. First, we argue that cortical inputs to the striatum encode a probability distribution over hidden states, known as the belief state. Second, we argue that striatal projection neurons transform this input through a set of basis functions, whose purpose is to facilitate reward prediction. The synaptic weights that parametrize these predictions are updated based on the DA RPE signal. Third, we argue that action selection circuits in the dorsal striatum use probabilistic information about rewards to implement uncertainty-guided exploration.

Key facts

NIH application ID: 9993580
Project number: 5U19NS113201-02
Recipient: HARVARD MEDICAL SCHOOL
Principal Investigator: Samuel J Gershman
Activity code: U19
Funding institute: NIH
Fiscal year: 2020
Award amount: $497,846
Award type: 5
Project period: 2019-08-15 → 2024-07-31