# Cross-platform structural variant discovery with deep learning

> **NIH NIH R01** · BROAD INSTITUTE, INC. · 2024 · $575,926

## Abstract

Structural variants (SV) are a major driver of the genetic diversity and disease in the human genome and their
discovery is imperative to advances in precision medicine and our understanding of human genetics. Due to
revolutionary breakthroughs in whole-genome sequencing technologies, we now have access to genomic data at an
unprecedented scale and resolution. However, despite tremendous effort and progress in SV calling methodology,
general SV discovery still remains unsolved. Existing techniques use hand-engineered features and heuristics to
model SV classes, relying heavily on developer expertise, which cannot scale to the vast diversity of SV types and
sequencing platforms nor fully harness all the information available in raw sequencing data. As a result, these
methods are usually tightly coupled to the properties of a particular sequencing technology and operate optimally
only on certain SV types and sizes, rendering us blind to many other classes of SVs and their role in disease. Deep
neural networks have the ability to learn complex abstractions automatically from the data and hence offer a
promising avenue for general SV discovery. Deep learning has recently transformed the field of machine learning
and led to remarkable advances in science and medicine. In this proposal we aim to leverage the potential of deep
learning for the problem of SV detection. We lay out how to efficiently formulate SV detection as a deep learning
task, and propose the development of a comprehensive framework to call and genotype SVs of different size and
type, including complex and subclonal SVs, given data from a range of sequencing platforms. In particular, we
demonstrate that state-of-the-art results can be obtained using our approach for short, linked, and long read
datasets. In order to ensure that our models generalize across different datasets, an important goal of our proposal
is also to assemble diverse and representative training data and perform extensive evaluation using publicly-
available multi-platform datasets to accurately assess model performance. Our software will be built with
extensibility and scalability in mind, and will be released, along with pretrained models and callsets, freely to the
community.

## Key facts

- **NIH application ID:** 10873957
- **Project number:** 5R01HG012467-03
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Victoria Popic
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $575,926
- **Award type:** 5
- **Project period:** 2022-09-01 → 2027-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10873957

## Citation

> US National Institutes of Health, RePORTER application 10873957, Cross-platform structural variant discovery with deep learning (5R01HG012467-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10873957. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
