Identifying and Characterizing the Full Spectrum of Haplotype-resolved Structural Variation in Human Genomes

NIH RePORTER · NIH · U24 · $1,888,152 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY The characterization of the full spectrum of genetic variation from whole-genome sequencing (WGS) data is essential for genome research and precision medicine. Recent technological advances have led to substantially improved sensitivity in detecting and characterizing structural variants (SVs) and the generation of highly contiguous phased genomes; however, some genomic regions (e.g., short arms of acrocentric chromosomes, pericentromeric regions and regions containing complex SVs) remain extremely difficult to accurately assemble and genotype. The Investigators of this project are well qualified to tackle this problem. They have worked together for well over a decade to make substantial advances toward comprehensive SV discovery and improved genome assemblies by combining data from multiple technologies and developing new tools for analyzing and integrating these data. This competing renewal for a community resource has four aims. In Aim 1, computational methods will be developed to optimize SV discovery through accurate genome assembly - in the absence of parental sequencing data - and will be applied to 426 samples from all 26 diverse populations of the 1000 Genomes Project (1kGP) where long-read sequencing data are available from both the Human Genome Structural Variation Consortium (HGSVC) and Human Pangenome Reference Consortium (HPRC) efforts. Aim 2 will develop pipelines that will provide the most comprehensive, rapid and low-cost genotyping of SVs in short-read datasets. This will be made possible from the incorporation of pooled Strand-seq data and inversions in 1000 individuals from the 1kGP. Aim 3 will develop pipelines and resources for SV imputation, genotyping, and functional characterization that can be used for future association studies. Proof-of-principle studies will be conducted on the 1kGP, and an autism spectrum disorder (ASD) cohort. Aim 4 will develop a fine-resolution SV resource containing precise breakpoint information and biologically relevant annotations. New visualization and analytical tools will be built into the International Genome Sample Resource (IGSR), making the data and tools acquired from this project widely available and in a manner that preserves the complexities of SVs. As a part of Aim 4, we also outline a plan to provide dedicated user training for our tools and datasets in different geographical locations and multiple times a year to maximize research community awareness and adoption. Taken together, our community resource project will provide valuable methods and tools for benchmarking SV discovery and genotyping across WGS datasets in the human genomic research and clinical domains.

Key facts

NIH application ID
10769047
Project number
2U24HG007497-09
Recipient
JACKSON LABORATORY
Principal Investigator
Evan Eichler
Activity code
U24
Funding institute
NIH
Fiscal year
2024
Award amount
$1,888,152
Award type
2
Project period
2013-09-20 → 2028-03-31