# Statistical Methods and Algorithms for Population Genomic Inference

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA AT DAVIS · 2021 · $503,088

## Abstract

Project Summary/Abstract
Phylogeny is fundamental to our understanding of biology and has translational applications to many areas of human
health including epidemiology, cancer biology and immunology. Genome sequences from closely related species
such as the great apes contain a wealth of information about their evolutionary history, including the species phy-
logeny and divergence times, population demography, and possible episodes of hybridization or admixture. How-
ever, extracting this information requires advanced probability models and efﬁcient statistical and computational
methods. This is because population genetic processes are stochastic and sequences from closely related species are
highly similar containing only weak historical information about some parameters. For this reason, it is critical to
develop parametric statistical methods that maximize the information extracted from the data. In this project we aim
to develop efﬁcient Bayesian computational methods for analysis of genome-scale datasets under the multispecies-
coalescent-with-introgression (MSci) model.
 The proposed research will develop and implement novel algorithms and statistical methods in the program bpp
to infer the number, the directions, timings, and intensity of introgression events between species (Aim 1). The
program will then accommodate naturally both deep coalescence and introgression in the model. This will also
allow a novel Bayesian method to be developed for inferring the probability that particular loci (genomic regions)
are introgressed from a particular species admixture event for each sequence of a diploid individual (Aim 2). This
question is of broad relevance and has been a subject of intense interest with respect to hominid admixtures. Another
useful extension will be the addition of ongoing migration between pairs of populations using an efﬁcient new
migration model formulation (Aim 3). The method will provide parameter estimates of migration rates that are
particularly relevant for designing safe CRISPR gene drive experiments in wild populations. The range of species
that the bpp program can be applied to will be expanded by incorporating a more parameter rich model of DNA
substitution (GTR+G) that better accommodates multiple substitutions per site and is necessary for analyzing more
distantly related species. Moreover, we will allow fossil calibrations and a relaxed molecular clock (incorporating
the features of our other program for divergence time estimation MCMCtree into bpp)(Aim 4). Fossil calibrations
will allow estimates of divergence times in units of years rather than expected DNA substitutions. To broaden the
accessibility of the program to users without command line program experience we will further develop a cross-
platform GUI for bpp (BPPg) using a modern Javascript framework (Aim 5). Finally, the statistical performance
of the method will be studied and compared to other methods (when they exist) by simulations and by analysis...

## Key facts

- **NIH application ID:** 10087945
- **Project number:** 5R01GM123306-02
- **Recipient organization:** UNIVERSITY OF CALIFORNIA AT DAVIS
- **Principal Investigator:** Bruce RANNALA
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $503,088
- **Award type:** 5
- **Project period:** 2020-02-01 → 2024-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10087945

## Citation

> US National Institutes of Health, RePORTER application 10087945, Statistical Methods and Algorithms for Population Genomic Inference (5R01GM123306-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10087945. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
