Collaborative Research: Calibrated Hypothesis Testing

NSF Award Search · 01002627DB NSF RESEARCH & RELATED ACTIVIT · $146,715 · view on nsf.gov ↗

Abstract

Scientific findings should come with error rates that mean what they say: among findings assigned a 5 percent chance of error, about 5 in 100 should turn out to be wrong. This standard, called calibration, underlies trusted probability claims from weather forecasting to machine learning, but it is not yet a routine part of the statistical tools used in many large-scale scientific studies. The issue arises whenever researchers must triage long lists of possible discoveries, anomalies, or published claims. In metascience, the question is which findings in the literature will replicate; in AI safety, which suspicious model inputs deserve greater scrutiny. Current methods control the average error rate across an entire list of discoveries, but they rarely provide individual findings with calibrated error probabilities. This award supports research on calibrated hypothesis testing, which will develop methods that distinguish strong evidence from borderline evidence with interpretable, rigorous guarantees. The work will support more reproducible science and safer data-driven systems, while training graduate researchers, developing new instructional materials, and releasing open-source software. This project will develop theory and methodology for calibrated, large-scale inference. The framework draws upon probabilistic forecasting but addresses a distinct challenge: unlike forecasting, where labels are eventually observed, in multiple testing the ground truth is never revealed,

Key facts

NSF award ID: 2610643
Awardee: Regents of the University of Michigan - Ann Arbor (MI)
SAM.gov UEI: GNJ7BBP73WE9
PI: Jake A Soloff
Primary program: 01002627DB NSF RESEARCH & RELATED ACTIVIT
All programs: Artificial Intelligence (AI), Machine Learning Theory
Estimated total: $146,715
Funds obligated: $146,715
Transaction type: Standard Grant
Period: 06/01/2026 → 05/31/2029