# Statistical Methods for Ultrahigh-dimensional Biomedical Data

> **NIH NIH R01** · PRINCETON UNIVERSITY · 2021 · $293,003

## Abstract

This proposal develops novel statistics and machine learning methods for distributed analysis of
big data in biomedical studies and precision medicine and for selecting a small group of
molecules that are associated with biological and clinical outcomes from high-throughput data
such as microarray, proteomic, and next generation sequence from biomedical research,
especially for autism studies and Alzheimer’s disease research. It focuses on developing
efficient distributed statistical methods for Big Data computing, storage, and communication,
and for solving distributed health data collected at different locations that are hard to aggregate
in meta-analysis due to privacy and ownership concerns. It develops both computationally and
statistically efficient methods and valid statistical tools for exploring heterogeneity of big data in
precision medicine, for studying associations of genomics and genetic information with clinical
and biological outcomes, and for feature selection and model building in presence of errors-in-
variables, endogeneity, and heavy-tail error distributions, and for predicting clinical outcomes
and understanding molecular mechanisms. It introduces more robust and powerful statistical
tests for selection of significant genes, SNPs, and proteins in presence of dependence of data,
valid control of false discovery rate for dependent test statistics, and evaluation of treatment
effects on a group of molecules. The strength and weakness of each proposed method will be
critically analyzed via theoretical investigations and simulation studies. Related software will be
developed for free dissemination. Data sets from ongoing autism research, Alzheimer’s disease,
and other biomedical studies will be analyzed by using the newly developed methods and the
results will be further biologically confirmed and investigated. The research findings will have
strong impact on statistical analysis of high throughput big data for biomedical research and on
understanding heterogeneity for precision medicine and molecular mechanisms of autism,
Alzheimer’s disease, and other diseases.

## Key facts

- **NIH application ID:** 10093056
- **Project number:** 5R01GM072611-16
- **Recipient organization:** PRINCETON UNIVERSITY
- **Principal Investigator:** Matias Damian Cattaneo
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $293,003
- **Award type:** 5
- **Project period:** 2006-02-01 → 2023-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10093056

## Citation

> US National Institutes of Health, RePORTER application 10093056, Statistical Methods for Ultrahigh-dimensional Biomedical Data (5R01GM072611-16). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10093056. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
