Mammals continuously adapt the process of action selection in noisy and volatile environments to maximize the success of future decisions by either selecting actions that are likely to return a desirable result (exploitation) or taking a risk on something new to see if that will produce a better outcome (exploration). This flexible decision-making is mediated by cortico-basal-ganglia-thalamic (CBGT) circuits that both control action selection and use feedback signals to modify the approach to future decisions (i.e., undergo reinforcement learning; RL). Dysfunction in how these pathways use feedback to guide future decisions is a primary mechanism for many addictive behaviors (e.g., opioid addiction, obesity). Despite the fact that decision-making and RL originate from a common neural substrate, they are generally studied as independent processes. Understanding the unified nature of action selection and learning requires a careful re-evaluation of how cognitive algorithms emerge from the circuit-level dynamics of CBGT networks. We propose a series of empirical and theoretical investigations that bridge across levels of analysis to unify algorithmic models of learning and decision-making in order to understand how CBGT networks use feedback to manage the trade-off between exploration and exploitation. Our first step toward achieving this goal will be to develop a computational “upwards mapping” framework that links cognitive process models with biologically realistic spiking models of CBGT networks under constraints imposed by existing behavioral observations from a set of adaptive decision-making experiments. This approach will allow us to derive testable predictions about how different CBGT network properties (e.g., population activity levels or pathway connection strengths) scale cognitive processes (e.g., evidence accumulation rate) to produce distinct phenotypes of decision policies (Specific Aim 1a). Using this paradigm we will also generate predictions about how, under changing conditions, neural plasticity mechanisms can adaptively shift CBGT networks into distinct states that manage the exploration-exploitation trade-off in contextually appropriate ways (Specific Aim 1b). Predictions will be tested experimentally using recordings in multiple key CBGT sites as well as optogenetic perturbation of striatal and subthalamic nucleus targets in rodents performing a 2-armed bandit task with static or variable action-outcome contingencies (Specific Aim 2). RELEVANCE (See instructions): Dysfunction in how the brain uses feedback to guide future decisions is a primary mechanism for many public health problems (e.g., addiction, cardiovascular disease). This research program will provide new insights into how neural circuits give rise to decision-making in humans and other mammals and how environmental contexts (e.g., volatility of reward schedules) regulate brain network configurations to produce behavioral flexibility. This information can provide ke...