# Expressive and scalable statistical models for genomic and biomedical data

> **NIH NIH R35** · UNIVERSITY OF CALIFORNIA LOS ANGELES · 2024 · $334,935

## Abstract

Project Summary
My lab develops and applies statistical models to make sense of genomic and biomedical data with the ultimate goal
of understanding the biological basis of diseases and improving human health. The dramatic decrease in the cost
of DNA sequencing has led to the emergence of datasets of genetic variation across large numbers of individuals
(sample sizes upwards of hundreds of thousands). This genetic data is paired with deep phenotypic and disease
information. While o ering the potential to answer important questions in biology and medicine, these complex
and massive datasets present formidable challenges of statistical modeling and inference. Extracting meaningful
insights from these datasets needs expressive and scalable statistical and computational methods. Our recent work
has focused on understanding evolutionary processes that shape genetic variation within homogeneous and admixed
populations and in understanding how genetic variation modulates variation in complex traits and disease risk. A
major discovery from our work is our nding that west African populations derive substantial genetic ancestry from
an unidenti ed ghost archaic population that was enabled, in turn, by new statistical methods that we developed
to infer local ancestry in admixed populations in the challenging setting where reference genomes for ancestral
populations are unavailable. Work from my lab has also led to statistical inference algorithms that are capable of
analyzing millions of genomes to provide new insights into both evolutionary processes and genetic architecture of
complex traits.
We now propose to substantively expand our research applying statistical machine learning to population and
quantitative genetics with the aim of understanding the interplay between evolution, genes and traits. We will
develop algorithms to uncover complex evolutionary histories from genome sequence data in the presence of
admixture, expressive and scalable models to infer the genetic architecture of complex traits within homogeneous
and admixed populations, and methods for deep learning-based phenotype imputation that deal with the high-
rates of missingness in biomedical datasets. Taken together, our e orts will provide powerful analytical tools to
e ectively probe the structure and function of the human genome.

## Key facts

- **NIH application ID:** 10842967
- **Project number:** 1R35GM153406-01
- **Recipient organization:** UNIVERSITY OF CALIFORNIA LOS ANGELES
- **Principal Investigator:** Sriram Sankararaman
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $334,935
- **Award type:** 1
- **Project period:** 2024-05-01 → 2029-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10842967

## Citation

> US National Institutes of Health, RePORTER application 10842967, Expressive and scalable statistical models for genomic and biomedical data (1R35GM153406-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10842967. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*