Reliable Methods for Estimation, Prediction and Causal Inference with Multiple AI-Generated Synthetic Datasets

NSF Award Search · 01002627DB NSF RESEARCH & RELATED ACTIVIT · $250,000 · view on nsf.gov ↗

Abstract

Artificial intelligence (AI) systems, including modern machine learning models and large language models, are now routinely used to generate predicted labels and synthetic data across science, medicine, and policy. Researchers increasingly rely on AI-generated datasets to replace expensive or scarce human-labeled data, but the quality of such outputs is often unknown and can vary widely across AI systems. Treating AI predictions as ground truth can lead to incorrect scientific conclusions, misleading policy recommendations, and overconfident uncertainty quantification—risks that grow as AI is deployed in higher-stakes settings, such as clinical research and public health. This project develops the statistical foundations needed to use AI-generated data in scientific analysis. By turning AI-generated data from a potential liability into a rigorously calibrated scientific resource, the project advances national priorities in science, health, and the responsible use of AI. The project also supports U.S. workforce development through training of graduate and undergraduate students, integration of research outcomes into university curricula, and the public release of open-source software that makes the methodology broadly accessible to scientists, agencies, and industry. This project develops a unified statistical theory for the safe and adaptive integration of multiple, heterogeneous, and potentially low-quality AI-generated synthetic datasets into estimation, prediction, and

Key facts

NSF award ID: 2610561
Awardee: University of Wisconsin-Madison (WI)
SAM.gov UEI: LCLSJAGTNZQ7
PI: Jiwei Zhao
Primary program: 01002627DB NSF RESEARCH & RELATED ACTIVIT
All programs: Artificial Intelligence (AI), Machine Learning Theory
Estimated total: $250,000
Funds obligated: $250,000
Transaction type: Standard Grant
Period: 07/01/2026 → 06/30/2029