Many critical online decision systems, including clinical support, financial risk management, and autonomous technologies, must look beyond average performance to avoid rare but catastrophic "tail events." Traditional reinforcement learning often summarizes future outcomes as a single expected value, which masks significant risks and uncertainty. This research addresses this limitation by developing distributional reinforcement learning methods that learn the full range of possible outcomes to support safer, risk-aware, and privacy-preserving decision-making. By improving the trustworthiness of systems in health, finance, and operations, this work strengthens the intersection of machine learning, artificial intelligence, and statistics while promoting the responsible use of sensitive individual data. Additionally, the project supports education by training students at the intersection of statistics, machine learning, optimization, and responsible artificial intelligence. The research focuses on quantile temporal difference learning, a scalable model-free method for estimating return quantiles from observed transitions. First, the project will establish finite-time guarantees for quantile temporal difference learning in both synchronous settings and asynchronous settings with Markovian data, including bounds for quantile estimation error and for the accuracy of the estimated return distribution. Second, the project will develop statistical inference methods for distribution