Decision dynamics during a continuous-time foraging task: a reinforcement learning approach

NIH RePORTER · NIH · F31 · $31,985 · view on reporter.nih.gov ↗

Abstract

Project Summary It is likely that evolution has strongly shaped the neural circuitry of the reward systems to optimize performance in the many tasks involved in foraging for resources, a critical part of every animal's life. This proposition was the inspiration for the development of “optimal” foraging theories, such as the marginal value theorem (MVT), which derive analytically the foraging behavior (sequences of choices) that maximizes the long-term rate of reward, usually considered to be energy intake. While these analytic theories have had some success in describing animal behavior, the theories themselves rely on strict assumptions about the environment that do not hold in many natural situations and are not flexible enough to generalize to more complicated environments or other tasks. Therefore, the end goal of this project is to understand which of a family of general-purpose decision (reinforcement-learning) algorithms is most likely to be employed by the brain to solve value-based tasks and to use this knowledge to predict under what circumstances these algorithms will lead to optimal or suboptimal behavior. With this project, I will improve our understanding of animal decision processes in these more natural environments by performing a foraging experiment that is continuous in time and violates many of the assumptions that prior analytical theories of foraging rely on. Rats motivated by thirst will be allowed to sample freely from two or three (palatable or aversive) tastant options (“patches”) in an open field and, critically, will be allowed to direct their encounters with the options, something which past experiments have lacked. Measurements of licking (consumption) behavior at each of the tastant options will allow me to measure the decision dynamics of the rat over several 1-hour sessions. In particular, I will measure how the sampling times at each option correlate with the values of the alternatives to gain insight into how rats combine the values of available options to make decisions. As a complement to this behavioral task, I will simulate a set of reinforcement learning agents that vary in the rules used for learning action values, choosing actions, and planning actions. By quantitatively comparing the decision behavior of these artificial agents to that obtained from rats I will determine which of the simulated agents best reproduces the rat behavior, giving insight into the decision algorithms used by rats and providing a direction for future electrophysiological recordings during this task. Importantly, this comparison of animal behavior with that produced by artificial agents will allow me to assess how close to “optimal” rat behavior is and, in the cases where it is suboptimal, to provide quantitative explanations for why it is so.

Key facts

NIH application ID: 10129762
Project number: 5F31DA051155-02
Recipient: BRANDEIS UNIVERSITY
Principal Investigator: Benjamin Nicolaas Ballintyn
Activity code: F31
Funding institute: NIH
Fiscal year: 2021
Award amount: $31,985
Award type: 5
Project period: 2020-04-01 → 2022-08-31