Random Matrix Theory and Manifold Learning for High-Dimensional Data Integration

NSF Award Search · 01002627DB NSF RESEARCH & RELATED ACTIVIT · $175,000 · view on nsf.gov ↗

Abstract

This project develops new mathematical and computational tools for integrating high-dimensional datasets with partially shared structures, a challenge that arises across various fields, including molecular biology, precision medicine, business analytics, and economics. When data are collected from multiple sources—such as different individuals, experimental conditions, or technologies—joint analysis can reveal complex patterns that would be missed if each dataset were analyzed in isolation. However, existing methods often struggle to distinguish meaningful signals from noise, particularly when the data are high-dimensional and heterogeneous. This project addresses these limitations by creating a principled framework to uncover and align shared low-dimensional structures across datasets, ultimately enabling more accurate, interpretable, and biologically relevant insights. The project will also contribute to the broader community by developing open-source software tools and offering interdisciplinary training opportunities for students at various levels. This project will build new theoretical foundations and methods at the intersection of random matrix theory, manifold learning, and high-dimensional statistics, and it is closely related to artificial intelligence. Key contributions include new results in random matrix theory for composite and kernel matrices formed from multiple datasets, a Procrustes-based framework for aligning low-dimensional structures in high-dimensio

Key facts

NSF award ID: 2515684
Awardee: Harvard University (MA)
SAM.gov UEI: LN53LCFJFL45
PI: Rong Ma
Primary program: 01002627DB NSF RESEARCH & RELATED ACTIVIT
All programs: Artificial Intelligence (AI), Machine Learning Theory
Estimated total: $175,000
Funds obligated: $175,000
Transaction type: Standard Grant
Period: 07/01/2026 → 06/30/2029