# Identifying and Characterizing the Full Spectrum of Haplotype-resolved Structural Variation in Human Genomes

> **NIH NIH U24** · JACKSON LABORATORY · 2021 · $2,695,991

## Abstract

PROJECT SUMMARY
The identification of structural variants (SVs) including deletions, insertions, duplications, and inversions from
human whole-genome sequencing (WGS) data is essential for genomic research and precision medicine.
However, SV discovery remains a challenge because no single sequencing technology or computer algorithm
effectively captures the full spectrum of SVs. The Investigators of this project have made substantial advances
toward comprehensive SV discovery by combining analyses from long- and short-read sequencing platforms, as
well as incorporating other technologies such as jumping libraries, linked-read sequencing, and chromosomal
strand-specific sequencing. Application of this approach to the genomes of three father-mother-child trios
identified approximately threefold more SVs than could be detected using standard short-read WGS alone. This
project builds on the Investigators’ ongoing work to develop optimized, integrated multi-technology computational
pipelines for the comprehensive identification of SVs in human genomes. In Aim 1, computational methods will
be developed for SV detection in WGS datasets generated using the multiple genomic technologies described
above, and the combination of computational methods yielding the most comprehensive and accurate SV callset
will be established as computational pipelines that will be packaged for broad sharing. This work will focus on
family trios and unrelated individuals from all 26 populations of the 1000 Genomes Project. Use of trios will also
enable determination of SV mutation rates for the different SV classes. Aim 2 will develop novel SV calling
methods that address the challenging task of SV detection in short-read-only WGS datasets. This work will focus
on genomes sequenced by large-scale NHGRI-funded initiatives that aim to identify genetic variants associated
with disease, such as the Centers for Common Disease Genomics (CCDG) and Centers for Mendelian Genomics
(CMG). Analyses of these short-read WGS datasets will yield a gold standard for genome-wide SV datasets and
serve as a resource that can be used to genotype common variants across the larger number of CCDG, CMG,
and other short-read WGS datasets. Execution of this project will generate deep coverage WGS and multi-
technology genomic datasets, as well as new SV callsets, for individuals across 26 populations around the world.
This data will be made widely available through an open FTP site. SV datasets for patient samples from CCDG
and CMG will be accessible through dbGaP and enable a more comprehensive association of genetic variants
with human diseases. All computational pipelines will be made available in a portable framework to promote
wide adoption by other users. Overall, this project will establish SV reference sets spanning many human
populations around the world in which all SVs (and small insertions and deletions) have been sequence resolved
and correctly phased along the entire length of the chromoso...

## Key facts

- **NIH application ID:** 10190985
- **Project number:** 5U24HG007497-07
- **Recipient organization:** JACKSON LABORATORY
- **Principal Investigator:** Evan Eichler
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $2,695,991
- **Award type:** 5
- **Project period:** 2013-09-20 → 2023-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10190985

## Citation

> US National Institutes of Health, RePORTER application 10190985, Identifying and Characterizing the Full Spectrum of Haplotype-resolved Structural Variation in Human Genomes (5U24HG007497-07). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10190985. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
