Computational methods for detecting patterns of complex genomic variation

NIH RePORTER · NIH · R01 · $289,446 · view on reporter.nih.gov ↗

Abstract

Project Summary Structural variations (SVs) – involving changes in copy number, inversions, translocations, and other mechanisms– are an important source of genetic variation. They occur in the germ-line and also in so- matic cells, where they sometimes play an outsized role in diseases, cancer being a prominent example. Much work has been done in identifying and cataloging `simple' variants such as deletions, duplications, translocations, and others. In contrast, our continuing proposal is about `complex' structural variation, characterized by extensive structural changes involving multiple breakpoints and simple SV events. In previous research funded by the grant (17 publications), we developed and extended tools for identifying complex SVs including Breakage Fusion Bridge characterized by specific copy number patterns, detec- tion of chains of disparate genomic segments as defined by Chromothripsis and Chromoplexy, and viral mediated rearrangements. Perhaps most relevant to the current proposal, is the problem of determining architecture and origin of focal amplification of smaller (< 10Mb) genomic segments. Working with col- laborators, we observed an abundance of large circular, extrachromosomal DNA (Turner, Nature 2017), detecting them in 40% of all cancer samples across a multitude of histological subtypes. EcDNA are hot- spots for complex, even multi-chromosomal genomic rearrangements, and o↵er a mechanistic explanation of focal amplifications. These discoveries were supported by the devlopment of many computational tools: AmpliconArchitect (AA) for reconstructing the fine structure of ecDNA using Illumina short-reads, ViFi for identifying complex variation due to viral integration in humans, and ecDetect for detection and quantification of ecDNA in cytogenetic images acquired in metaphase. For this grant, we will (i) develop Amplicon Reconstructor (AR) as a tool for disambiguated AA reconstructed amplicons using long reads–Oxford Nanopore, Pacific Biosciences, and Optical Nanopore technology; (ii) use AR to understand the evolution of complex structural variation thorugh directed evolution of ecDNA in the lab; and (iii), integrate data from thousands of whole genome sequences, transcript and other epigenetic data to elucidate the functional aspects of ecDNA elements.

Key facts

NIH application ID
9818448
Project number
2R01GM114362-05
Recipient
UNIVERSITY OF CALIFORNIA, SAN DIEGO
Principal Investigator
Vineet Bafna
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$289,446
Award type
2
Project period
2016-01-01 → 2023-12-31