# Expanding the capabilities of third-generation sequencing for multi-dimensional genome and transcriptome analysis.

> **NIH NIH DP2** · UNIV OF MASSACHUSETTS MED SCH WORCESTER · 2024 · $1,507,500

## Abstract

Project Summary
In cells, nucleic acids exist as molecular complexes, whose function depends not only on their base-pairing
potential but also on their shapes and chemical structures. The DNA in a cell’s genome forms a highly dynamic
macromolecular structure, called ‘chromatin’, and the DNA can be modified by regulatory proteins (e.g. 5-
methylation of cytosine) or it can be damaged by oxidation or genotoxic compounds. RNA is subject to over 170
different modifications, which can impact its stability, structure, or function. There is a wealth of information in
these nucleic acids beyond their ability to form Watson-Crick base pairings, yet next-generation sequencing
approaches are only able to interpret DNA through its base pairing and loses much of this information. Third-
generation sequencing technologies, such as nanopore sequencing, can theoretically measure modified or
damaged bases. Nanopore sequencing works by passing nucleic acids through an engineered pore and
measuring the current that flows past the five bases of the DNA molecule currently in the pore. Thus, if a modified
base is in the pore, the current will be slightly different than it would be for the unmodified base. However, in
order to recognize these subtle current differences, training libraries with the modified (and unmodified) bases in
every single five-base combination must be sequenced on the device to create a set of reference values that
can be used to decode these subtle changes. This is a complicated and expensive process, beyond the reach
of a single lab, and as such only a couple of modified bases can currently be sequenced by nanopore. My lab
has developed an approach that allows for the rapid and cheap (on the order of several hundred dollars)
generation of these training libraries using procedural barcoded synthesis. With this approach, the generation
of a reference library is easily achievable for a single project by a small academic lab. My lab has generated
training libraries and successfully sequenced numerous modified DNA bases, which has allowed us to perform
innovative multi-dimensional sequencing experiments; reading DNA damage or encoded cellular properties such
as proliferation along with the base sequence. In this project, I seek to expand our technology. First, we will
generate training libraries to allow for direct detection of common damaged bases resulting from processes such
as oxidation or alkylation, which will permit deeper understanding of how DNA damage occurs in settings such
as aging or cancer treatment. Second, we will adapt our method to work with RNA, allowing for direct detection
of the myriad modifications that occur in the transcriptome and are important in the structure, function, and
regulation of mRNA and non-coding functional RNAs alike. The experiments performed as part of this project
will enhance our understanding of how nucleic acids are modified and damaged within cells, and the further
development of our technology will pro...

## Key facts

- **NIH application ID:** 10910566
- **Project number:** 1DP2GM159179-01
- **Recipient organization:** UNIV OF MASSACHUSETTS MED SCH WORCESTER
- **Principal Investigator:** William Alexander Flavahan
- **Activity code:** DP2 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $1,507,500
- **Award type:** 1
- **Project period:** 2024-09-01 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10910566

## Citation

> US National Institutes of Health, RePORTER application 10910566, Expanding the capabilities of third-generation sequencing for multi-dimensional genome and transcriptome analysis. (1DP2GM159179-01). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10910566. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*