# Reliable Methods for Estimation, Prediction and Causal Inference with Multiple AI-Generated Synthetic Datasets

> **NSF 01002627DB NSF RESEARCH & RELATED ACTIVIT** · University of Wisconsin-Madison (WI) · $250,000

## Abstract

Artificial intelligence (AI) systems, including modern machine learning models and large language models, are now routinely used to generate predicted labels and synthetic data across science, medicine, and policy. Researchers increasingly rely on AI-generated datasets to replace expensive or scarce human-labeled data, but the quality of such outputs is often unknown and can vary widely across AI systems. Treating AI predictions as ground truth can lead to incorrect scientific conclusions, misleading policy recommendations, and overconfident uncertainty quantification—risks that grow as AI is deployed in higher-stakes settings, such as clinical research and public health. This project develops the statistical foundations needed to use AI-generated data in scientific analysis. By turning AI-generated data from a potential liability into a rigorously calibrated scientific resource, the project advances national priorities in science, health, and the responsible use of AI. The project also supports U.S. workforce development through training of graduate and undergraduate students, integration of research outcomes into university curricula, and the public release of open-source software that makes the methodology broadly accessible to scientists, agencies, and industry.

This project develops a unified statistical theory for the safe and adaptive integration of multiple, heterogeneous, and potentially low-quality AI-generated synthetic datasets into estimation, prediction, and 

## Key facts

- **NSF award ID:** 2610561
- **Awardee organization:** University of Wisconsin-Madison (WI)
- **SAM.gov UEI:** LCLSJAGTNZQ7
- **PI:** Jiwei Zhao
- **Primary program:** 01002627DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Artificial Intelligence (AI), Machine Learning Theory
- **Estimated total:** $250,000
- **Funds obligated:** $250,000
- **Transaction type:** Standard Grant
- **Period:** 07/01/2026 → 06/30/2029

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2610561

## Citation

> US National Science Foundation, Award 2610561, Reliable Methods for Estimation, Prediction and Causal Inference with Multiple AI-Generated Synthetic Datasets. Retrieved via AI Analytics 2026-06-07 from https://api.ai-analytics.org/grant/nsf/2610561. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
