# Distributional Reinforcement Learning for Risk-Sensitive Sequential Decision Making: New Theory and Methods

> **NSF 01002627DB NSF RESEARCH & RELATED ACTIVIT** · University of Miami (FL) · $299,601

## Abstract

Many critical online decision systems, including clinical support, financial risk management, and autonomous technologies, must look beyond average performance to avoid rare but catastrophic "tail events." Traditional reinforcement learning often summarizes future outcomes as a single expected value, which masks significant risks and uncertainty. This research addresses this limitation by developing distributional reinforcement learning methods that learn the full range of possible outcomes to support safer, risk-aware, and privacy-preserving decision-making. By improving the trustworthiness of systems in health, finance, and operations, this work strengthens the intersection of machine learning, artificial intelligence, and statistics while promoting the responsible use of sensitive individual data. Additionally, the project supports education by training students at the intersection of statistics, machine learning, optimization, and responsible artificial intelligence.

The research focuses on quantile temporal difference learning, a scalable model-free method for estimating return quantiles from observed transitions. First, the project will establish finite-time guarantees for quantile temporal difference learning in both synchronous settings and asynchronous settings with Markovian data, including bounds for quantile estimation error and for the accuracy of the estimated return distribution. Second, the project will develop statistical inference methods for distribution

## Key facts

- **NSF award ID:** 2610563
- **Awardee organization:** University of Miami (FL)
- **SAM.gov UEI:** RQMFJGDTQ5V3
- **PI:** Lan Wang
- **Primary program:** 01002627DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Artificial Intelligence (AI), Machine Learning Theory
- **Estimated total:** $299,601
- **Funds obligated:** $299,601
- **Transaction type:** Standard Grant
- **Period:** 07/01/2026 → 06/30/2029

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2610563

## Citation

> US National Science Foundation, Award 2610563, Distributional Reinforcement Learning for Risk-Sensitive Sequential Decision Making: New Theory and Methods. Retrieved via AI Analytics 2026-06-07 from https://api.ai-analytics.org/grant/nsf/2610563. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*