# Computational methods for large-scale genotype data

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2020 · $450,000

## Abstract

Project Summary
The size of genetic data sets is growing exponentially. At the current rate of growth, the largest
reference panels of phased, sequenced individuals will have millions of individuals within 5-7 years. This
research will address the computational challenges of performing genotype phasing and imputation in
large cohorts and with large reference panels.
Large cohorts from outbred populations typically contain a mixture of nominally unrelated and closely
related individuals. Current phasing methods for these large data sets do not model parent-offspring or
other close relationships. We will develop a new phasing method that greatly increases phase accuracy
in closely-related individuals and that scales to large sample sizes.
Increasing reference panel size also increases genotype phase and imputation accuracy. However,
computational cost also increases with reference panel size. We will develop a new reference file format
that substantially reduces the computational cost of imputation and phasing with large reference
panels. We will provide a format specification, software, and software libraries so that other researchers
and software developers can readily use the new reference file format.
We will develop a new computational method for finding shared haplotype segments between a
reference panel and a target haplotype. This new method will significantly reduce the cost of phasing
and imputation using large reference panels.
Finally, we will extend the fastest, most accurate method for genotype phasing and imputation (Beagle
5.0) to analyse chromosome X data. This extension will improve genetic studies of this important
chromosome.

## Key facts

- **NIH application ID:** 9985992
- **Project number:** 5R01HG008359-05
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** BRIAN LEE BROWNING
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $450,000
- **Award type:** 5
- **Project period:** 2015-09-15 → 2023-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9985992

## Citation

> US National Institutes of Health, RePORTER application 9985992, Computational methods for large-scale genotype data (5R01HG008359-05). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9985992. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
