# Statistical Frameworks for Self-Supervised Representation Learning and Their Biomedical Applications

> **NSF 01002526DB NSF RESEARCH & RELATED ACTIVIT** · University of Illinois at Urbana-Champaign (IL) · $175,000

## Abstract

While recent advancements in large-scale machine learning models have shown impressive capabilities, they often rely on hundreds of millions of labeled samples. However, obtaining high-quality labels in many fields is extremely costly, so most available data remain unlabeled. For example, although millions of images and videos can be easily collected from social media platforms, manually labeling them is a tedious and time-consuming process. To address the challenge of limited labeled data, self-supervised representation learning has emerged as a promising approach in computer vision and natural language processing. It has already played a key role in the success of recent large language models. Despite its strong performance in practice, the theoretical understanding of self-supervised representation learning remains limited. Moreover, the problem of scarce labeled data also affects biomedical research, but the existing self-supervised methods cannot be directly applied due to the unique nature of biomedical datasets. This project aims to address these gaps by developing new theoretical frameworks for self-supervised representation learning, along with computational tools tailored to biomedical studies. It also includes educational efforts to engage students and the broader public with this growing area of research.

This project aims to advance the theoretical foundations of self-supervised representation learning and transform how unlabeled data are utilized in biomedica

## Key facts

- **NSF award ID:** 2515171
- **Awardee organization:** University of Illinois at Urbana-Champaign (IL)
- **SAM.gov UEI:** Y8CWNJRCNN91
- **PI:** Shulei Wang
- **Primary program:** 01002526DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Artificial Intelligence (AI), Machine Learning Theory, Biotechnology
- **Estimated total:** $175,000
- **Funds obligated:** $175,000
- **Transaction type:** Standard Grant
- **Period:** 07/01/2025 → 06/30/2028

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2515171

## Citation

> US National Science Foundation, Award 2515171, Statistical Frameworks for Self-Supervised Representation Learning and Their Biomedical Applications. Retrieved via AI Analytics 2026-06-08 from https://api.ai-analytics.org/grant/nsf/2515171. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
