Scalable methods for identity by descent

NIH RePORTER · NIH · R01 · $570,000 · view on reporter.nih.gov ↗

Abstract

ABSTRACT In the next a few years, large genotyped cohorts are becoming available (e.g., TOPMed, UK biobank, All of Us, Million Veteran Program). With the sample size approaches 0.1%-1% of the total population size, extensive distant relatives and Identity-by-descent, or IBD information are represented in such samples. Such information will enable more sophisticated and powerful genetics analysis beyond single variant-based analyses. However, current informatics methods are not equipped with the efficiency to handle genotype data of that scale. We will develop new genome informatics methods for biobank-scale cohorts with genotypes. We have developed an efficient tool, RaPID, the first computationally feasible method for inferring IBD segments among individuals in a biobank-scale cohort. We demonstrated that RaPID achieves running time linear to the sample size and is over 100 times faster than existing methods. At the same time, RaPID detects a greater number of IBDs, with higher accuracy, and sharper segment boundaries than existing methods. In this application, we propose to develop (1) the RaPID+ method for pairwise IBD detection that can tolerate and correct phasing errors, with a principled way of parameter tuning, and can work with genotype data across sequencing and array platforms; (2) the RaPID-diploid method for detection of IBD2 segments; (3) the RaPID-multiway method that identifies IBD Cluster; and (4) the RaPID-ancestry method for local ancestry inference across subcontinental populations. Methods will be rigorously tested in simulations using realistic population demographic models as well as real data from large cohorts. All methods will be implemented as free software for academic use. This project will advance genetic research by developing efficient informatics tools that reveal detailed genetic relationships in very large genotyped cohorts.

Key facts

NIH application ID
9899283
Project number
5R01HG010086-03
Recipient
UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
Principal Investigator
Shaojie Zhang
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$570,000
Award type
5
Project period
2018-06-01 → 2022-03-31