Generative Models for Predictive Insights and Inference in Multimodal Data

NSF Award Search · 01002526DB NSF RESEARCH & RELATED ACTIVIT · $175,000 · view on nsf.gov ↗

Abstract

The project utilizes generative artificial intelligence to create high-quality synthetic data that accurately represents complex real-world information—such as medical images, financial records, and social media text—while ensuring individual privacy is protected. By providing scientists, engineers, and students with safe and realistic data sets, the project accelerates discovery, strengthens the nation’s technological workforce, and supports informed decision-making in health, commerce, and security. Additionally, the open benchmarks and instructional materials generated by the project encourage participation in data science, allowing everyone to contribute to and benefit from scientific advancements. The research develops a unified Generative Prediction and Inference framework that combines diffusion processes, normalizing flows, and transfer learning to model joint distributions of tabular and unstructured modalities. The framework samples synthetic multimodal data to improve supervised tasks such as image captioning and question answering, delivers calibrated uncertainty estimates, and tests for hallucinations in large language models. Key contributions include algorithms for domain adaptation, reliability metrics for trustworthy AI, and agent-based tools that automate analysis of complex datasets. The resulting software and evaluation suites establish new standards for multimodal data synthesis and statistical inference. This award reflects NSF's statutory mission

Key facts

NSF award ID
2513668
Awardee
University of Minnesota-Twin Cities (MN)
SAM.gov UEI
KABJZBBJ4B54
PI
Xiaotong T Shen
Primary program
01002526DB NSF RESEARCH & RELATED ACTIVIT
All programs
Artificial Intelligence (AI), Machine Learning Theory, STATISTICS
Estimated total
$175,000
Funds obligated
$175,000
Transaction type
Standard Grant
Period
08/15/2025 → 07/31/2028