# Integrative genomic and epigenomic analysis of cancer using long read sequencing

> **NIH NIH U01** · JOHNS HOPKINS UNIVERSITY · 2021 · $383,463

## Abstract

PROJECT SUMMARY
The last twenty years have experienced extensive growth in the sequencing of cancer genomes, leading to a
dramatically increased understanding of the role of genetic and epigenetic mutations in cancer. This has largely
been enabled by developments in high-throughput “second-generation” sequencing technology and analysis
that characterize cancer genomes using short-reads. Recently, a new generation of high-throughput long-read
sequencing instruments, primarily from Pacific Biosciences and Oxford Nanopore, have become available that
are poised to displace short-read sequencing for many applications. We and others have used these
technologies to discover tens of thousands of variants per cancer genome that are not detectable using
short-reads, including structural variants and differentially methylated regions in known oncogenes and cancer
risk genes. These technologies carry the potential to address many open questions in cancer biology, however,
the analysis of long-read sequencing data is computationally demanding and needs specialized algorithms that
are either too inefficient to use at scale or do not yet exist. In this proposal, we will address several gaps in the
application of long-read technology for basic research and clinical use in cancer genomics. First, we will
develop improved methods for finding structural variants and complex repeat expansions from long-reads,
both of which are major diagnostic and prognostic indicators of disease, yet are not accurately identified using
existing methods. Leveraging the improved phasing capabilities of long reads, this work will include the
detection of mosaic variants, revealing tumor heterogeneity and variants in precancerous tissues. Next, we will
apply machine learning and systems level advances to accelerate and improve the comparison of variants
across large patient cohorts. Critically, this will compensate for the error prone nature of single molecule
long-read sequencing to make these comparisons more accurate when comparing tumor-normal samples or
pedigrees of related patients so that recurrent driving mutations can be accurately identified. Finally, we will
develop integrative methods for the joint analysis of genome, transcriptome, and epigenetic profiling of cancer
genomes. These advances will improve the identification of fusion genes, and allow for entirely new forms of
epigenetic analysis, such as the allele-specific analysis of methylation across transposable elements and other
repetitive elements. Synthesizing the many thousands of novel variants we will detect using our methods, we
will then develop algorithms that will identify and evaluate recurrent genetic or epigenetic variations as
putative driving mutations. All methods will be released open-source and will empower us, our ITCR
collaborators, and the cancer genomics community at large to study genetic and epigenetic variants with near
perfect accuracy and thereby unlock many new associations to treatment and d...

## Key facts

- **NIH application ID:** 10187808
- **Project number:** 1U01CA253481-01A1
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** MICHAEL SCHATZ
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $383,463
- **Award type:** 1
- **Project period:** 2021-05-01 → 2024-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10187808

## Citation

> US National Institutes of Health, RePORTER application 10187808, Integrative genomic and epigenomic analysis of cancer using long read sequencing (1U01CA253481-01A1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10187808. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
