# Enhancement and further development of informatics methods for long-read cancer sequencing

> **NIH NIH U24** · DANA-FARBER CANCER INST · 2024 · $872,555

## Abstract

Project Summary
Long-read sequencing is rapidly transforming our knowledge of the human genome as well as the approach to
uncovering human genetic variation and alterations. In contrast to the rapid pace of algorithmic innovations for
long-read sequencing of human genomes, both the informatic development and the generation of long-read
cancer genome data have seen lagging. With the accuracy and cost of long-read sequencing both approaching
short reads, we anticipate long-read cancer genome sequencing to soon become the new frontier of cancer
genomics and the primary engine of cancer genomic discoveries. The overarching goal of this application is to
catalyze long-read cancer genome sequencing efforts through the development of informatic methods for the
discovery and characterization of somatic genetic alterations in cancer genomes. We propose three lines of
research activities to achieve this goal. First, we will improve existing methods for long-read analysis, including
both long-read alignment and assembly, and develop downstream bioinformatic tools for somatic variant
discovery from aligned long reads (Aim 1) and from de novo long-read assembly (Aim 2). Second, in parallel to
the informatic development, we will generate a resource of long-read cancer genome data that are used for the
benchmarking and evaluation of long-read informatic methods (Aim 3). We will specifically compare the
performance of variant detection from alignment-based and assembly-based approaches to generate best
practices for long-read cancer genome applications. Finally, we aim to build and expand an active community of
researchers who interact with, generate, analyze, or develop informatic methods for long-read cancer genome
data (Aim 4). The community building effort will initially focus on providing tutorials and user examples based on
the newly developed informatic methods and newly generated long-read data, and eventually aim to establish a
catalog of reference cancer genome assemblies for use by the cancer research community.

## Key facts

- **NIH application ID:** 10990145
- **Project number:** 1U24CA294203-01
- **Recipient organization:** DANA-FARBER CANCER INST
- **Principal Investigator:** Catarina D. Campbell
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $872,555
- **Award type:** 1
- **Project period:** 2024-09-01 → 2029-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10990145

## Citation

> US National Institutes of Health, RePORTER application 10990145, Enhancement and further development of informatics methods for long-read cancer sequencing (1U24CA294203-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10990145. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*