# Scalable detection and interpretation of structural variation in human genomes

> **NIH NIH R01** · UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH · 2020 · $692,048

## Abstract

PROJECT SUMMARY
Structural variation (SV), is a diverse class of genome variation that includes copy number variants (CNVs)
such as deletions and duplications, as well as balanced rearrangements, such as inversions and reciprocal
translocations. A typical human genome harbors >4,000 SVs larger than 300bp and their large size increases
the potential to delete or duplicate genes, disrupt chromatin structure, and alter expression. Despite their
prevalence and potential for phenotypic consequence, SVs remain notoriously difficult to detect and genotype
with high accuracy. Much of this difficulty is driven by the fact DNA sequence alignment “signals” indicating
SVs are far more complex than for single-nucleotide and insertion deletion variants. Unlike SNP alignments
that vary only in allele state, alignments supporting SVs vary in state (supports an alternate structure or not)
alignment location, and type. Consequently, the accuracy of SV discovery is much lower than that of SNPs and
INDELs. Furthermore, SV pipelines scale poorly and are difficult to run. These challenges are a barrier for
single genome analysis and studies of families must invest substantial effort into eliminating a sea of false
positives. These problems become exponentially more acute for large-scale sequencing efforts such as
TOPmed, the Centers for Common Disease Genetics, and the All of Us program. Software efficiency is key to
scalability for such projects. However, of equal importance is comprehensive, accurate discovery.
 Building upon more than a decade of software development experience and analyzing SV in diverse
disease contexts, we have invested significant effort into understanding the causes of the insufficient accuracy
for SV discovery. These efforts, together with our research and development experience in this area, give us
unique insight into improving the accuracy and scalability of SV discovery. Our goal is to narrow the accuracy
gap between SNP/INDEL variation and structural variation discovery. These developments will empower
studies of human genomes in diverse contexts and will therefore have broad impact. Our goals are to:
1. Develop a deep learning model to correct systematic variation in sequence depth. This new machine
 learning model will correct systematic biases in DNA sequence depth and dramatically improve the
 discovery of deletions and duplications.
2. Improve the speed, scalability, and accuracy of SV detection and genotyping. Using new algorithms,
 we will bring the accuracy of SV detection much closer to that of SNP and INDEL discovery and allow
 accurate SV discovery to be deployed at scale.
3. Create a map of genomic constraint for SV from population-scale genome analysis. We will deploy
 our new methods to detect and genotype structural variation among tens of thousands of human genomes.
 The resulting SV map will empower the creation of a model of genomic constraint for SV and enable new
 software to predict deleterious SVs, especially in the...

## Key facts

- **NIH application ID:** 9973582
- **Project number:** 1R01HG010757-01A1
- **Recipient organization:** UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH
- **Principal Investigator:** Aaron R Quinlan
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $692,048
- **Award type:** 1
- **Project period:** 2020-05-01 → 2024-02-29

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9973582

## Citation

> US National Institutes of Health, RePORTER application 9973582, Scalable detection and interpretation of structural variation in human genomes (1R01HG010757-01A1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9973582. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*