Unsupervised Learning and Nonlinear Dimension Reduction: Advances with Optimal Transport, Empirical Bayes, and Variational Inference

NSF Award Search · 01002526DB NSF RESEARCH & RELATED ACTIVIT · $240,000 · view on nsf.gov ↗

Abstract

Modern scientific data sets—ranging from single-cell RNA sequencing with tens of thousands of genes per patient, to galaxy-survey spectra with millions of stars, to user-item interaction matrices in online platforms—share two features: (i) ultra-high dimensionality and (ii) latent parameters that obey common structural laws (e.g., exchangeability, sparsity, or low-rank dependence). This project tackles both challenges at once. It advances statistical foundations for such problems by (1) providing a new framework to theoretically study empirical Bayes methods in these complex models that learn the latent-parameter distribution directly from the data, and (2) developing cutting-edge unsupervised dimension-reduction techniques that embed the high-dimensional observations into lower-dimensional representations while preserving the essential structure and relationships within the data. Together, these tools will transform ad-hoc prior modeling into an objective, data-driven procedure and yield principled, scalable inference for large-scale applications. Further, collaborations with astronomers will ensure immediate scientific impact, and several of the research directions will shape the Ph.D. dissertation of multiple Columbia graduate students, fostering the next generation of data-science leaders. The project integrates two tightly linked research thrusts. (a) Building on recent advances in nonparametric empirical Bayes, the PI will design flexible empirical Bayes estimators

Key facts

NSF award ID: 2515520
Awardee: Columbia University (NY)
SAM.gov UEI: F4N1QNPB95M4
PI: Bodhisattva Sen
Primary program: 01002526DB NSF RESEARCH & RELATED ACTIVIT
All programs: Artificial Intelligence (AI), Machine Learning Theory
Estimated total: $240,000
Funds obligated: $240,000
Transaction type: Standard Grant
Period: 07/01/2025 → 06/30/2028