Detecting structural variants in a large population of samples through high-throughput sequencing data

NIH RePORTER · NIH · R35 · $387,727 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY The mapping of the human genome and genome wide association studies have provided great insights in our understanding of the genetic etiology of hereditary diseases; however, critical gaps remain. A type of genetic variations that has been difficult to detect in genomic studies has been Structural Variants (SVs), disruptions involving more than 50 base pairs. SVs have been implicated in a lot of inherited diseases and cancers, yet their detection remains challenging with conventional DNA sequencing methods. Developments in third- generation sequencing (linked-read and long-read sequencing) and single-cell RNA sequencing (scRNA-seq) provide an opportunity to greatly improve the detection of SVs and Copy Number Variations (CNVs), one common type of SVs. However, existing computational tools do not fully take advantage of the potential and the opportunities that these technologies offer. In this project, drawing from our unique expertise in this rapidly evolving area, we propose the development of a new generation of tools that will improve greatly the detection and phasing of SVs from a large population of samples. We will develop computational tools to generate a high-quality diploid assembly from each individual and to combine data from large populations of controls and patients to characterize SVs that confer risk for any particular disease. We will further design a haplotype- based linkage disequilibrium (LD) mapping approach at the whole genome scale to identify unique sharing haplotype patterns and provide a new perspective for complex disease studies. Detecting SVs in combination with small variants will further allow us to explain the etiology of complex diseases. We will also develop algorithms to detect CNVs from scRNA-seq datasets, which have application in cancer studies. Successful completion of this project will constitute a major step forward in uncovering the genetic cause of complex diseases and cancers.

Key facts

NIH application ID
10928181
Project number
5R35GM146960-03
Recipient
VANDERBILT UNIVERSITY
Principal Investigator
Xin Maizie Zhou
Activity code
R35
Funding institute
NIH
Fiscal year
2024
Award amount
$387,727
Award type
5
Project period
2022-09-20 → 2027-07-31