# Population genetics for large-scale sequencing studies of diverse populations

> **NIH NIH R01** · STANFORD UNIVERSITY · 2020 · $136,876

## Abstract

Summary
Population-based studies identifying common genetic variants that affect complex human diseases have relied
heavily on population-genetic principles in important tasks such as study design, quality control, and genotype
imputation. As the emphasis of mapping studies has now shifted to investigating rare variants in next-
generation sequencing projects, new opportunities exist for leveraging population genetics to maximize the
return from these investigations. Because studies thus far have often focused on populations of European
descent, it is critical that new methods provide tools to analyze data from a greater diversity of populations.
This project builds on productive efforts in the first funding period, proposing methods that capitalize on the
study of human population genetics to enhance the design, analysis, and interpretation of genome sequencing
studies, and focusing on analysis of rare risk variants in diverse human populations. (1) We will devise
methods for selecting subsamples of individuals for genome and exome sequencing, particularly in admixed
and structured populations. Such subsamples will make it possible for researchers to maximize their potential
for achieving statistical power to detect rare disease variants. (2) We will enhance variant-calling accuracy,
particularly in low-coverage data and for challenging indels and copy-number variants, by including in the
variant-calling pipeline evidence accumulated from closely related haplotypes in the population. This approach
will be particularly beneficial in admixed and genetically diverse populations, in which haplotype variation is
especially significant and selecting an informative haplotype subset to assist in variant-calling is of greatest
value. (3) We will use population-genetic principles to improve sample quality control in sequencing studies.
First, we address the common challenge of sample contamination, which adversely affects variant-calling and
downstream analyses. We will produce a method to estimate the genotypes of the minor contributor of a mixed
sample, thus enabling the population of origin of a contaminating signal to be identified. This identification
further facilitates variant-calling and permits in silico deconvolution of mixed samples. Second, to enhance the
sharing of samples in large projects, we will devise methods to uncover duplicate or related samples from non-
overlapping marker sets. Our approach will reduce the risk of expending effort to obtain sequence that will not
be fully utilized, and will also assist in making use of historical low-density data in understudied populations. (4)
We will incorporate new advances in the study of human population growth and natural selection for evaluating
rare-variant tests and identifying powerful testing strategies. Evaluations of current tools often ignore important
population-genetic factors such as selection or accelerating growth; our methods will enhance models for
analyzing rare-variant testin...

## Key facts

- **NIH application ID:** 10063406
- **Project number:** 3R01HG005855-10S1
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** Noah Rosenberg
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $136,876
- **Award type:** 3
- **Project period:** 2010-09-13 → 2022-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10063406

## Citation

> US National Institutes of Health, RePORTER application 10063406, Population genetics for large-scale sequencing studies of diverse populations (3R01HG005855-10S1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10063406. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
