Population genetics for large-scale sequencing studies of diverse populations

NIH RePORTER · NIH · R01 · $385,000 · view on reporter.nih.gov ↗

Abstract

Population-based studies identifying common genetic variants that affect complex human diseases have relied heavily on population-genetic principles in important tasks such as study design, quality control, and genotype imputation. As the emphasis of mapping studies has now shifted to investigating rare variants in next- generation sequencing projects, new opportunities exist for leveraging population genetics to maximize the return from these investigations. Because studies thus far have often focused on populations of European descent, it is critical that new methods provide tools to analyze data from a greater diversity of populations. This project builds on productive efforts in the first funding period, proposing methods that capitalize on the study of human population genetics to enhance the design, analysis, and interpretation of genome sequencing studies, and focusing on analysis of rare risk variants in diverse human populations. (1) We will devise methods for selecting subsamples of individuals for genome and exome sequencing, particularly in admixed and structured populations. Such subsamples will make it possible for researchers to maximize their potential for achieving statistical power to detect rare disease variants. (2) We will enhance variant-calling accuracy, particularly in low-coverage data and for challenging indels and copy-number variants, by including in the variant-calling pipeline evidence accumulated from closely related haplotypes in the population. This approach will be particularly beneficial in admixed and genetically diverse populations, in which haplotype variation is especially significant and selecting an informative haplotype subset to assist in variant-calling is of greatest value. (3) We will use population-genetic principles to improve sample quality control in sequencing studies. First, we address the common challenge of sample contamination, which adversely affects variant-calling and downstream analyses. We will produce a method to estimate the genotypes of the minor contributor of a mixed sample, thus enabling the population of origin of a contaminating signal to be identified. This identification further facilitates variant-calling and permits in silico deconvolution of mixed samples. Second, to enhance the sharing of samples in large projects, we will devise methods to uncover duplicate or related samples from non- overlapping marker sets. Our approach will reduce the risk of expending effort to obtain sequence that will not be fully utilized, and will also assist in making use of historical low-density data in understudied populations. (4) We will incorporate new advances in the study of human population growth and natural selection for evaluating rare-variant tests and identifying powerful testing strategies. Evaluations of current tools often ignore important population-genetic factors such as selection or accelerating growth; our methods will enhance models for analyzing rare-variant testing method...

Key facts

NIH application ID
9951083
Project number
5R01HG005855-10
Recipient
STANFORD UNIVERSITY
Principal Investigator
Noah Rosenberg
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$385,000
Award type
5
Project period
2010-09-13 → 2022-06-30