Deep learning methods for genotyping structural variants in human genomes

NIH RePORTER · NIH · R15 · $375,839 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY: Structural variants (SVs) play a causal role in numerous diseases. However, our ability to detect and analyze disease-causing SVs, particularly de novo SVs, in short read genome sequencing data is limited by inaccurate genotyping (determining zygosity). There exists a substantial gap between the genotyping accuracy for small variants, e.g., single nucleotide variants, and SVs. Improving the accuracy of SV genotyping will increase the rates of molecular diagnosis, improve our understanding of multiple diseases, and expand our knowledge of human genetic variation. Our aim is to develop more accurate tools for genotyping SVs in short read genome sequencing data by incorporating the specific genomic context, sequencing instrument, and analysis pipeline into the genotyping model. Instead of attempting to develop a parametric model for those complex and interconnected processes, we generate estimates of the expected evidence using simulation. Our goals are to: 1. Develop a deep learning-based SV genotyper that automatically learns informative features shared by the real and simulated data in an image-based representation of the SV. Treating SV genotyping as an image similarity problem will enable us to more accurately genotype the many different SVs that might exist, not just those observed previously. 2. Deploy our new method to generate accurate genotypes for an ensemble of short and long-read derived SV call sets in thousands of human genomes. The resulting dataset will increase our understanding of the spectrum of structural variation across diverse populations. 3. Leverage our similarity model to automatically correct otherwise imprecise or incorrect SV descriptions; doing so will increase genotyping accuracy, improve the integration of different SV call sets, and enable more sensitive SV discovery in the future.

Key facts

NIH application ID
10796022
Project number
1R15HG012859-01A1
Recipient
MIDDLEBURY COLLEGE
Principal Investigator
Michael David Linderman
Activity code
R15
Funding institute
NIH
Fiscal year
2024
Award amount
$375,839
Award type
1
Project period
2024-09-16 → 2027-08-31