# Scalable methods for identity by descent

> **NIH NIH R01** · UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON · 2024 · $623,359

## Abstract

Abstract
In the past few years, large genotyped cohorts are getting larger. We are getting closer to the era where genotype
information of a large portion of the population is available. Informatics methods are critically needed for
translating the new information into insights for human genetics. Powered by informatics innovations, the
landscape of IBD segment detection has been transformed in the past 3 years. In 2019, we published RaPID,
the first IBD segment calling method efficient enough for biobank-scale cohorts. Afterwards, a generation of
methods has been developed to offer solutions for IBD segment calling. In addition, we delivered new algorithms
and methods that enriched the PBWT data structure. Also, the impacts of calling out IBD segments in large
cohorts are demonstrated by the powering of precision characterization of diversity in general population cohorts,
the studies of population history and human behavior, IBD-based relatedness estimate, and IBD-mapping.
However, current success in identifying IBD segments from biobank-scale cohorts is only the beginning. More
informatics method developments are needed to fully unleash the power of genotype information. First, current
methods are mainly for longer IBD segments (greater than 3 or 5 centimorgans (cM)), and the detection power
for shorter segments are insufficient. Also the accuracy is not uniformly high across all genomic regions and all
populations. Second, current methods are mainly for IBD segments shared between a pair of haplotypes. With
large sample sizes, multi-way IBDs are omnipresent but under-studied. Third, methods for identifying IBD
segments between a query haplotype and reference panels (1-vs-n) are needed. For a small sample or even
individuals, 1-vs-n query against a panel will enable powerful interpretation leveraging the rich information in the
reference panel. However, current IBD segment detection methods are mainly a batch calling mode that
conducts n-vs-n comparisons and thus are not flexible enough to address such needs. In this competitive renewal
project, we propose to further develop efficient, accurate, and flexible algorithms for IBD segment detection for
large biobank-scale data. We will improve IBD segment calling across the genome, across length-spectrum, and
across ethnicities; we will develop methods for multi-way IBD cluster detection; and we will develop reference-
based IBD calling and threading methods. These new informatics methods will enable the community to better
leverage the genetic relationships in large genotyped cohorts for genetic discovery.

## Key facts

- **NIH application ID:** 10929955
- **Project number:** 5R01HG010086-06
- **Recipient organization:** UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
- **Principal Investigator:** Shaojie Zhang
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $623,359
- **Award type:** 5
- **Project period:** 2018-06-01 → 2027-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10929955

## Citation

> US National Institutes of Health, RePORTER application 10929955, Scalable methods for identity by descent (5R01HG010086-06). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10929955. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*