Modeling, Inference, and Optimization for Genomic and Biomedical Big Data

NIH RePORTER · NIH · R35 · $539,241 · view on reporter.nih.gov ↗

Abstract

Abstract The biomedical sciences are drowning in big data. Progress in fields such as genomics and medical imaging is being stymied by the lack of ap- propriate computational tools. This grant promotes the development of algorithms, statistical methods, and software for the analysis of the big datasets encountered in the biomedical sciences. The NIH All of Us Pro- gram, the Million Veteran Project (MVP) sponsored by US Department of Veterans Affairs (VA), and the UK Biobank are three prime examples of recent massive datasets. These datasets require terabytes of storage on sample sizes ranging from 105 to 106 and above subjects. The datasets are also dynamic, growing over time in size and complexity. In addition, the datasets are heterogeneous; for example, the UK Biobank offers ge- nomic data, electronic health record (EHR) data, and imaging data on the same study individuals. Finally, as with most real-world data, the data are fraught with missingness and inaccuracy. We propose attacking the issues of parameter estimation and model selection raised by such massive datasets. We will be guided by princi- ples of parsimony and high-dimensional optimization. Most of the specific applications we have in mind involve imaging and genomics, particularly genomewide association discovery. Fortunately, most of the tools and soft- ware we construct will be more generically useful. Our successful algo- rithms will be coded in the modern scientific programming language Julia and posted on publicly available websites. We will focus on constrained and sparse regression, EM and MM algorithms for optimization, variance components models, bootstrapping of linear mixed models, a copula-like model for correlated data, and sensitivity analysis in epidemic models. These are all subjects of paramount importance in modern genomics, bio- statistics and data mining.

Key facts

NIH application ID
10438722
Project number
5R35GM141798-02
Recipient
UNIVERSITY OF CALIFORNIA LOS ANGELES
Principal Investigator
Kenneth L Lange
Activity code
R35
Funding institute
NIH
Fiscal year
2022
Award amount
$539,241
Award type
5
Project period
2021-07-01 → 2026-05-31