Novel Statistical Inference for Biomedical Big Data

NIH RePORTER · NIH · R01 · $456,980 · view on reporter.nih.gov ↗

Abstract

Project Summary This project develops novel statistical inference procedures for biomedical big data (BBD), including data from diverse omics platforms, various medical imaging technologies and electronic health records. Statistical inference, i.e., assess- ing uncertainty, statistical significance and confidence, is a key step in computational pipelines that aim to discover new disease mechanisms and develop effective treatments using BBD. However, the development of statistical inference procedures for BBD has lagged behind technological advances. In fact, while point estimation and variable selection procedures for BBD have matured over the past two decades, existing inference procedures are either limited to simple methods for marginal inference and/or lack the ability to integrate biomedical data across multiple studies and plat- forms. This paucity is, in large part, due to the challenges of statistical inference in high-dimensional models, where the number of features is considerably larger than the number of subjects in the study. Motivated by our team's extensive and complementary expertise in analyzing multi-omics data from heterogenous studies, including the TOPMed project on which multiple team members currently collaborate, the current proposal aims to address these challenges. The first aim of the project develops a novel inference procedure for conditional parameters in high-dimensional models based on dimension reduction, which facilitates seamless integration of external biological information, as well as biomedical data across multiple studies and platforms. To expand the application of this method to very high-dimensional models that arise in BBD applications, the second aim develops a data-adaptive screening procedure for selecting an optimal subset of relevant variables. The third aim develops a novel inference procedure for high-dimensional mixed linear models. This method expands the application domain of high-dimensional inference procedures to studies with longitu- dinal data and repeated measures, which arise commonly in biomedical applications. The fourth aim develops a novel data-driven procedure for controlling the false discovery rate (FDR), which facilitates the integration of evidence from multiple BBD sources, while minimizing the false negative rate (FNR) for optimal discovery. Upon evaluation using ex- tensive simulation experiments and application to multi-omics data from the TOPMed project, the last aim implements the proposed methods into easy-to-use open-source software tools leveraging the R programming language and the capabilities of the Galaxy workflow system, thus providing an expandable platform for further developments for BBD methods and tools.

Key facts

NIH application ID
9969887
Project number
1R01GM133848-01A1
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
ALI SHOJAIE
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$456,980
Award type
1
Project period
2020-09-05 → 2024-08-31