Enhancement and further development of informatics methods for long-read cancer sequencing

NIH RePORTER · NIH · U24 · $872,555 · view on reporter.nih.gov ↗

Abstract

Project Summary Long-read sequencing is rapidly transforming our knowledge of the human genome as well as the approach to uncovering human genetic variation and alterations. In contrast to the rapid pace of algorithmic innovations for long-read sequencing of human genomes, both the informatic development and the generation of long-read cancer genome data have seen lagging. With the accuracy and cost of long-read sequencing both approaching short reads, we anticipate long-read cancer genome sequencing to soon become the new frontier of cancer genomics and the primary engine of cancer genomic discoveries. The overarching goal of this application is to catalyze long-read cancer genome sequencing efforts through the development of informatic methods for the discovery and characterization of somatic genetic alterations in cancer genomes. We propose three lines of research activities to achieve this goal. First, we will improve existing methods for long-read analysis, including both long-read alignment and assembly, and develop downstream bioinformatic tools for somatic variant discovery from aligned long reads (Aim 1) and from de novo long-read assembly (Aim 2). Second, in parallel to the informatic development, we will generate a resource of long-read cancer genome data that are used for the benchmarking and evaluation of long-read informatic methods (Aim 3). We will specifically compare the performance of variant detection from alignment-based and assembly-based approaches to generate best practices for long-read cancer genome applications. Finally, we aim to build and expand an active community of researchers who interact with, generate, analyze, or develop informatic methods for long-read cancer genome data (Aim 4). The community building effort will initially focus on providing tutorials and user examples based on the newly developed informatic methods and newly generated long-read data, and eventually aim to establish a catalog of reference cancer genome assemblies for use by the cancer research community.

Key facts

NIH application ID: 10990145
Project number: 1U24CA294203-01
Recipient: DANA-FARBER CANCER INST
Principal Investigator: Catarina D. Campbell
Activity code: U24
Funding institute: NIH
Fiscal year: 2024
Award amount: $872,555
Award type: 1
Project period: 2024-09-01 → 2029-08-31