This project develops mathematical tools for assessing when artificial intelligence systems can be trusted after their predictions are used to make decisions. Modern AI tools help rank drug candidates, support medical decisions, screen large data sets, and suggest scientific hypotheses. In these settings, predictions are often used selectively and repeatedly: users may follow up only on top-ranked cases, choose confidence levels after seeing outputs, or let automated tools gather evidence over time. Standard uncertainty statements can become overly optimistic in such adaptive pipelines. This project will build a statistical quality-control layer for AI-assisted decisions and discoveries, helping users understand how reliable the resulting decisions are. The work can improve reproducibility and efficiency in biomedical science, drug discovery, and other data-intensive fields where experiments are costly and errors can slow progress. The project will also train graduate and undergraduate students in modern statistics, trustworthy AI, and responsible data science, with efforts to broaden participation. Publicly available software, benchmarks, and teaching materials will support education, reproducible research, and safer use of AI. The technical goal of this project is to develop finite-sample, model-agnostic, and distribution-free inference methods for AI systems used inside adaptive decision and discovery pipelines. The work draws on conformal prediction, predictive inference, selective inference, multiple testing, permutation methods, and anytime-valid testing. The first thrust will develop set-level predictive inference methods for multiple unlabeled instances, including false discovery rate control, family-wise error rate control, global null testing, partial-conjunction testing, and model selection under exchangeability and weighted exchangeability. The second thrust will study predictive inference under adaptive human use, including selective issuance of pred