# Non-parametric estimation for multimodal data: From statistical theory to efficient algorithms

> **NSF 01002627DB NSF RESEARCH & RELATED ACTIVIT** · University of California-Los Angeles (CA) · $139,995

## Abstract

Multimodal datasets, which combine sources such as medical imaging, clinical records, and genetic information, have the potential to significantly advance our understanding of complex systems and improve health outcomes. However, the heterogeneity, high dimensionality, and lack of reliable statistical tools often lead to unstable analyses or misleading conclusions. These issues — and the limited ability to rigorously quantify uncertainty or disentangle relationships among data sources — pose a major barrier to the adoption of data-driven methods in high-stakes settings, where the cost of error can be substantial (e.g., clinical decision-making, disease monitoring, and health policy). This research project will develop robust, scalable, and statistically principled methods for integrating and analyzing multimodal data, with a particular emphasis on uncertainty quantification. The project also integrates research and education through: (a) the involvement of undergraduate, graduate, and postdoctoral students in both research and dissemination, along with mentoring to support their continued professional development; (b) the integration of research findings in UCLA courses and openly accessible online materials; and (c) workshops and outreach activities designed to broaden participation in data science.

In more detail, the research focuses on the challenge of nonparametric estimation and uncertainty quantification for multimodal data, in which multiple high-dimensional and heterogeneous data sources must be integrated to enable reliable inference. The initial goal is to develop robust and scalable methods for estimating the effects of individual modalities, utilizing deep learning to model auxiliary structures and employing kernel-based techniques to provide uncertainty quantification. Armed with such methods, the follow-up goal is to construct machine learning-powered estimators that identify and quantify the pathways through which modalities influence outcomes, co

## Key facts

- **NSF award ID:** 2515903
- **Awardee organization:** University of California-Los Angeles (CA)
- **SAM.gov UEI:** RN64EPNH8JC6
- **PI:** Xiaowu Dai
- **Primary program:** 01002627DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Artificial Intelligence (AI), Machine Learning Theory
- **Estimated total:** $139,995
- **Funds obligated:** $139,995
- **Transaction type:** Standard Grant
- **Period:** 06/01/2026 → 05/31/2029

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2515903

## Citation

> US National Science Foundation, Award 2515903, Non-parametric estimation for multimodal data: From statistical theory to efficient algorithms. Retrieved via AI Analytics 2026-06-26 from https://api.ai-analytics.org/grant/nsf/2515903. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
