# Novel Statistical Inference for Biomedical Big Data

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2020 · $456,980

## Abstract

Project Summary
This project develops novel statistical inference procedures for biomedical big data (BBD), including data from diverse
omics platforms, various medical imaging technologies and electronic health records. Statistical inference, i.e., assess-
ing uncertainty, statistical signiﬁcance and conﬁdence, is a key step in computational pipelines that aim to discover new
disease mechanisms and develop effective treatments using BBD. However, the development of statistical inference
procedures for BBD has lagged behind technological advances. In fact, while point estimation and variable selection
procedures for BBD have matured over the past two decades, existing inference procedures are either limited to simple
methods for marginal inference and/or lack the ability to integrate biomedical data across multiple studies and plat-
forms. This paucity is, in large part, due to the challenges of statistical inference in high-dimensional models, where the
number of features is considerably larger than the number of subjects in the study. Motivated by our team's extensive
and complementary expertise in analyzing multi-omics data from heterogenous studies, including the TOPMed project
on which multiple team members currently collaborate, the current proposal aims to address these challenges. The ﬁrst
aim of the project develops a novel inference procedure for conditional parameters in high-dimensional models based
on dimension reduction, which facilitates seamless integration of external biological information, as well as biomedical
data across multiple studies and platforms. To expand the application of this method to very high-dimensional models
that arise in BBD applications, the second aim develops a data-adaptive screening procedure for selecting an optimal
subset of relevant variables. The third aim develops a novel inference procedure for high-dimensional mixed linear
models. This method expands the application domain of high-dimensional inference procedures to studies with longitu-
dinal data and repeated measures, which arise commonly in biomedical applications. The fourth aim develops a novel
data-driven procedure for controlling the false discovery rate (FDR), which facilitates the integration of evidence from
multiple BBD sources, while minimizing the false negative rate (FNR) for optimal discovery. Upon evaluation using ex-
tensive simulation experiments and application to multi-omics data from the TOPMed project, the last aim implements
the proposed methods into easy-to-use open-source software tools leveraging the R programming language and the
capabilities of the Galaxy workﬂow system, thus providing an expandable platform for further developments for BBD
methods and tools.

## Key facts

- **NIH application ID:** 9969887
- **Project number:** 1R01GM133848-01A1
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** ALI SHOJAIE
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $456,980
- **Award type:** 1
- **Project period:** 2020-09-05 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9969887

## Citation

> US National Institutes of Health, RePORTER application 9969887, Novel Statistical Inference for Biomedical Big Data (1R01GM133848-01A1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9969887. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
