Unraveling the topological architecture and phenotypic contexture of structural variation

NIH RePORTER · NIH · R03 · $297,031 · view on reporter.nih.gov ↗

Abstract

Abstract The increasing adoption of whole-genome sequencing (WGS) in the context of genomic medicine and precision oncology has resulted in the accelerated discovery of structural variants (SVs) in patient cancer genomes. However, while human cancer types are generally characterized by widespread genomic instability the functional consequences of most structural and copy number variants (CNV) remain poorly understood. Critically, it is unknown which of the hundreds to thousands of genomic rearrangements typically observed in a patient tumor are pathogenic and which are non- functional genomic scars. Because SVs alter the genome at the structural (linear sequence), topological (three-dimensional organization), and phenotypic levels (epigenetic landscape), integrative and multiscale datasets are necessary to correctly predict their impact. This dearth of integrative resources and tools critically limits the medical interpretation of patient genetic data. Existing large-scale genomic and proteogenomic cancer characterization efforts, including the Common Fund (CF) Gabriella Miller Kids First (GMKF) data resource provide rich data to link genetic information including SVs with their phenotypic consequences, such as gene expression. However, these datasets alone are insufficient to provide deep mechanistic and functional insights. CF data sets, specifically 4D Nucleome (4DN), Epigenomics (Roadmap), and GTEx provide the blueprint to link germline variation, genome topology, and chromatin architecture to gene expression. Therefore, we propose the integration of genomic data from patient tumor samples (GMKF), with spatial and functional data (4DN, Roadmap, GTEx), which will allow us to elucidate and predict the pathogenic mechanisms of structural variants: Aim 1: To create TopVar a data resource to enhance our understanding of the interplay between genome TOPology and structural VARiation. The integrative TopVar resource will provide the phenotypic context required to interpret SVs in genetic and biological terms, which will yield testable hypotheses regarding their downstream effects. Aim 2: To develop and evaluate a predictive model of SV pathogenicity across multiple human cancers. Using the structured TopVar data resource, we will implement an interpretable statistical model to predict which SVs have an impact on gene expression, utilizing multiple layers of the integrated data. The realization of both aims will represent a proof-of-principle for the utility of TopVar for predictive modeling of SVs in the context of precision oncology. While our proposed study will focus on interrogating the comprehensive genomic data generated by GMKF (pediatric cancer) and CPTAC (adult cancer), it will serve as the foundation for their use within real-time sequencing programs, such as MI-OncoSeq and Peds-MI-OncoSeq, focusing on refractory and metastatic tumors.

Key facts

NIH application ID
10356208
Project number
1R03OD032625-01
Recipient
UNIVERSITY OF MICHIGAN AT ANN ARBOR
Principal Investigator
Marcin Piotr Cieslik
Activity code
R03
Funding institute
NIH
Fiscal year
2021
Award amount
$297,031
Award type
1
Project period
2021-09-22 → 2023-09-21