Expanding the capabilities of third-generation sequencing for multi-dimensional genome and transcriptome analysis.

NIH RePORTER · NIH · DP2 · $1,507,500 · view on reporter.nih.gov ↗

Abstract

Project Summary In cells, nucleic acids exist as molecular complexes, whose function depends not only on their base-pairing potential but also on their shapes and chemical structures. The DNA in a cell’s genome forms a highly dynamic macromolecular structure, called ‘chromatin’, and the DNA can be modified by regulatory proteins (e.g. 5- methylation of cytosine) or it can be damaged by oxidation or genotoxic compounds. RNA is subject to over 170 different modifications, which can impact its stability, structure, or function. There is a wealth of information in these nucleic acids beyond their ability to form Watson-Crick base pairings, yet next-generation sequencing approaches are only able to interpret DNA through its base pairing and loses much of this information. Third- generation sequencing technologies, such as nanopore sequencing, can theoretically measure modified or damaged bases. Nanopore sequencing works by passing nucleic acids through an engineered pore and measuring the current that flows past the five bases of the DNA molecule currently in the pore. Thus, if a modified base is in the pore, the current will be slightly different than it would be for the unmodified base. However, in order to recognize these subtle current differences, training libraries with the modified (and unmodified) bases in every single five-base combination must be sequenced on the device to create a set of reference values that can be used to decode these subtle changes. This is a complicated and expensive process, beyond the reach of a single lab, and as such only a couple of modified bases can currently be sequenced by nanopore. My lab has developed an approach that allows for the rapid and cheap (on the order of several hundred dollars) generation of these training libraries using procedural barcoded synthesis. With this approach, the generation of a reference library is easily achievable for a single project by a small academic lab. My lab has generated training libraries and successfully sequenced numerous modified DNA bases, which has allowed us to perform innovative multi-dimensional sequencing experiments; reading DNA damage or encoded cellular properties such as proliferation along with the base sequence. In this project, I seek to expand our technology. First, we will generate training libraries to allow for direct detection of common damaged bases resulting from processes such as oxidation or alkylation, which will permit deeper understanding of how DNA damage occurs in settings such as aging or cancer treatment. Second, we will adapt our method to work with RNA, allowing for direct detection of the myriad modifications that occur in the transcriptome and are important in the structure, function, and regulation of mRNA and non-coding functional RNAs alike. The experiments performed as part of this project will enhance our understanding of how nucleic acids are modified and damaged within cells, and the further development of our technology will pro...

Key facts

NIH application ID: 10910566
Project number: 1DP2GM159179-01
Recipient: UNIV OF MASSACHUSETTS MED SCH WORCESTER
Principal Investigator: William Alexander Flavahan
Activity code: DP2
Funding institute: NIH
Fiscal year: 2024
Award amount: $1,507,500
Award type: 1
Project period: 2024-09-01 → 2027-08-31