Statistical Frameworks for Self-Supervised Representation Learning and Their Biomedical Applications

NSF Award Search · 01002526DB NSF RESEARCH & RELATED ACTIVIT · $175,000 · view on nsf.gov ↗

Abstract

While recent advancements in large-scale machine learning models have shown impressive capabilities, they often rely on hundreds of millions of labeled samples. However, obtaining high-quality labels in many fields is extremely costly, so most available data remain unlabeled. For example, although millions of images and videos can be easily collected from social media platforms, manually labeling them is a tedious and time-consuming process. To address the challenge of limited labeled data, self-supervised representation learning has emerged as a promising approach in computer vision and natural language processing. It has already played a key role in the success of recent large language models. Despite its strong performance in practice, the theoretical understanding of self-supervised representation learning remains limited. Moreover, the problem of scarce labeled data also affects biomedical research, but the existing self-supervised methods cannot be directly applied due to the unique nature of biomedical datasets. This project aims to address these gaps by developing new theoretical frameworks for self-supervised representation learning, along with computational tools tailored to biomedical studies. It also includes educational efforts to engage students and the broader public with this growing area of research. This project aims to advance the theoretical foundations of self-supervised representation learning and transform how unlabeled data are utilized in biomedica

Key facts

NSF award ID: 2515171
Awardee: University of Illinois at Urbana-Champaign (IL)
SAM.gov UEI: Y8CWNJRCNN91
PI: Shulei Wang
Primary program: 01002526DB NSF RESEARCH & RELATED ACTIVIT
All programs: Artificial Intelligence (AI), Machine Learning Theory, Biotechnology
Estimated total: $175,000
Funds obligated: $175,000
Transaction type: Standard Grant
Period: 07/01/2025 → 06/30/2028