# Modeling, Inference, and Optimization for Genomic and Biomedical Big Data

> **NIH NIH R35** · UNIVERSITY OF CALIFORNIA LOS ANGELES · 2024 · $539,241

## Abstract

Abstract
The biomedical sciences are drowning in big data. Progress in ﬁelds such
as genomics and medical imaging is being stymied by the lack of ap-
propriate computational tools. This grant promotes the development of
algorithms, statistical methods, and software for the analysis of the big
datasets encountered in the biomedical sciences. The NIH All of Us Pro-
gram, the Million Veteran Project (MVP) sponsored by US Department of
Veterans Affairs (VA), and the UK Biobank are three prime examples of
recent massive datasets. These datasets require terabytes of storage on
sample sizes ranging from 105 to 106 and above subjects. The datasets
are also dynamic, growing over time in size and complexity. In addition,
the datasets are heterogeneous; for example, the UK Biobank offers ge-
nomic data, electronic health record (EHR) data, and imaging data on the
same study individuals. Finally, as with most real-world data, the data are
fraught with missingness and inaccuracy.
 We propose attacking the issues of parameter estimation and model
selection raised by such massive datasets. We will be guided by princi-
ples of parsimony and high-dimensional optimization. Most of the speciﬁc
applications we have in mind involve imaging and genomics, particularly
genomewide association discovery. Fortunately, most of the tools and soft-
ware we construct will be more generically useful. Our successful algo-
rithms will be coded in the modern scientiﬁc programming language Julia
and posted on publicly available websites. We will focus on constrained
and sparse regression, EM and MM algorithms for optimization, variance
components models, bootstrapping of linear mixed models, a copula-like
model for correlated data, and sensitivity analysis in epidemic models.
These are all subjects of paramount importance in modern genomics, bio-
statistics and data mining.

## Key facts

- **NIH application ID:** 10845683
- **Project number:** 5R35GM141798-04
- **Recipient organization:** UNIVERSITY OF CALIFORNIA LOS ANGELES
- **Principal Investigator:** Kenneth L Lange
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $539,241
- **Award type:** 5
- **Project period:** 2021-07-01 → 2026-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10845683

## Citation

> US National Institutes of Health, RePORTER application 10845683, Modeling, Inference, and Optimization for Genomic and Biomedical Big Data (5R35GM141798-04). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10845683. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
