Artificial intelligence (AI) systems, including modern machine learning models and large language models, are now routinely used to generate predicted labels and synthetic data across science, medicine, and policy. Researchers increasingly rely on AI-generated datasets to replace expensive or scarce human-labeled data, but the quality of such outputs is often unknown and can vary widely across AI systems. Treating AI predictions as ground truth can lead to incorrect scientific conclusions, misleading policy recommendations, and overconfident uncertainty quantification—risks that grow as AI is deployed in higher-stakes settings, such as clinical research and public health. This project develops the statistical foundations needed to use AI-generated data in scientific analysis. By turning AI-generated data from a potential liability into a rigorously calibrated scientific resource, the project advances national priorities in science, health, and the responsible use of AI. The project also supports U.S. workforce development through training of graduate and undergraduate students, integration of research outcomes into university curricula, and the public release of open-source software that makes the methodology broadly accessible to scientists, agencies, and industry. This project develops a unified statistical theory for the safe and adaptive integration of multiple, heterogeneous, and potentially low-quality AI-generated synthetic datasets into estimation, prediction, and