# Unsupervised Learning and Nonlinear Dimension Reduction: Advances with Optimal Transport, Empirical Bayes, and Variational Inference

> **NSF 01002526DB NSF RESEARCH & RELATED ACTIVIT** · Columbia University (NY) · $240,000

## Abstract

Modern scientific data sets—ranging from single-cell RNA sequencing with tens of thousands of genes per patient, to galaxy-survey spectra with millions of stars, to user-item interaction matrices in online platforms—share two features: (i) ultra-high dimensionality and (ii) latent parameters that obey common structural laws (e.g., exchangeability, sparsity, or low-rank dependence). This project tackles both challenges at once. It advances statistical foundations for such problems by (1) providing a new framework to theoretically study empirical Bayes methods in these complex models that learn the latent-parameter distribution directly from the data, and (2) developing cutting-edge unsupervised dimension-reduction techniques that embed the high-dimensional observations into lower-dimensional representations while preserving the essential structure and relationships within the data. Together, these tools will transform ad-hoc prior modeling into an objective, data-driven procedure and yield principled, scalable inference for large-scale applications. Further, collaborations with astronomers will ensure immediate scientific impact, and several of the research directions will shape the Ph.D. dissertation of multiple Columbia graduate students, fostering the next generation of data-science leaders.


The project integrates two tightly linked research thrusts. (a) Building on recent advances in nonparametric empirical Bayes, the PI will design flexible empirical Bayes estimators

## Key facts

- **NSF award ID:** 2515520
- **Awardee organization:** Columbia University (NY)
- **SAM.gov UEI:** F4N1QNPB95M4
- **PI:** Bodhisattva Sen
- **Primary program:** 01002526DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Artificial Intelligence (AI), Machine Learning Theory
- **Estimated total:** $240,000
- **Funds obligated:** $240,000
- **Transaction type:** Standard Grant
- **Period:** 07/01/2025 → 06/30/2028

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2515520

## Citation

> US National Science Foundation, Award 2515520, Unsupervised Learning and Nonlinear Dimension Reduction: Advances with Optimal Transport, Empirical Bayes, and Variational Inference. Retrieved via AI Analytics 2026-06-08 from https://api.ai-analytics.org/grant/nsf/2515520. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
