# From Semi-supervised Learning to Prediction-powered Inference

> **NSF 01002526DB NSF RESEARCH & RELATED ACTIVIT** · University of Washington (WA) · $250,000

## Abstract

In today’s world, data are cheap, plentiful, and everywhere . . . except when they are not. The vast majority of books have been digitized, a substantial portion of the internet has been scraped, and low-cost biomedical technologies are available at a doctor’s (or patient’s) fingertips. But, often the data needed for a particular real-world problem is much harder to access, due to cost or other constraints. This project will answer the question: how can the outputs of machine learning or artificial intelligence algorithms be used to augment limited datasets in order to draw meaningful statistical conclusions?  

Consider a setting where the target of inference is a functional of the joint distribution of X and Y, and n independent and identically distributed observations of (X,Y) are available. This research considers the following questions: under what circumstances, by how much, and how can additional observations for which we only have access to X (and not Y) improve inference? The investigative team will consider this question first from a theoretical perspective, by establishing new semi-parametric efficiency results for semi-supervised learning (Project 1); then from a methodological perspective, by developing new and improved estimators for prediction-powered inference (PPI, Project 2); and finally from an applied perspective, by proposing PPI estimators of true positive rate, false positive rate, and area under the curve (Project 3).

This award reflects NSF's st

## Key facts

- **NSF award ID:** 2514344
- **Awardee organization:** University of Washington (WA)
- **SAM.gov UEI:** HD1WMN6945W6
- **PI:** Daniela Witten
- **Primary program:** 01002526DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Machine Learning Theory, Artificial Intelligence (AI)
- **Estimated total:** $250,000
- **Funds obligated:** $250,000
- **Transaction type:** Standard Grant
- **Period:** 09/15/2025 → 08/31/2028

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2514344

## Citation

> US National Science Foundation, Award 2514344, From Semi-supervised Learning to Prediction-powered Inference. Retrieved via AI Analytics 2026-06-08 from https://api.ai-analytics.org/grant/nsf/2514344. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
