# Novel bioinformatics methods to detect DNA and RNA modifications using Nanopore long-read sequencing

> **NIH NIH R01** · CHILDREN'S HOSP OF PHILADELPHIA · 2024 · $690,069

## Abstract

PROJECT SUMMARY
DNA modifications, such as 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) on DNA, as well as
RNA modifications, such as N6-methyladenosine (m6A) on mRNA, have been implicated in gene regulation and
human diseases. Synthetic base analogs, such as BrdU, EdU and IdU, have been used as genomic markers to
study fundamental biological processes. However, conventional approaches to detect DNA/RNA modifications
rely on indirect readout (such bisulfite treatment or immunoprecipitation), cannot assay repetitive regions (due
to the use of short reads), and suffer from various technical biases. While direct DNA/RNA sequencing on the
Oxford Nanopore platform can address these technical limitations, there is an urgent need to develop reliable
bioinformatics methods to detect common DNA/RNA methylations from ionic current data, with the ability to
extend to rare forms of modifications. We have years of dedication to the development of computational tools
for signal-level analysis of long-read sequencing data. We developed NanoMod which detects synthetically
introduced DNA modifications into replicating cells, and DeepMod which uses a deep neural network to predict
5mC directly from ionic current signals from Nanopore sequencing. In the current proposal, we will: (1) Develop
LongReadSum, which will be implemented by multi-threaded C++ with modules for diverse formats (FASTA,
FASTQ, FAST5, BAM, POD5, etc), for ultrafast quality control (QC) and signal summarization from Nanopore
sequencing. The signal summarization procedure generates user-specified feature vectors that can be used by
other downstream machine-learning tools for calling modifications. (2) Develop ModDNA, where we will use
connectionist temporal classification (CTC) and transformers, two neural network models, to call modified
bases such as 5mC, 5hmC and 6mA. Additionally, we will adapt the computational pipeline to reduced
representation methylation sequencing (RRMS) data, which enables assaying a human genome in one single
MinION flowcell. (3) Develop ModRNA, an integrative model which combines prior genomic features with
context-dependent features (for example, enrichment in 3’ end of genes) for both de novo and model-based
detection of RNA m6A modifications and other rare modifications. (4) Validate and improve the computational
tools via benchmarking data sets. We will perform Nanopore DNA sequencing from cancer samples with
paired methylation profiles from clinical diagnostic labs, as well as mouse reference cell lines with or without 5-
Aza-2’-deoxycytidine (methylation inhibitor) treatment. We will perform direct mRNA sequencing on reference
cell lines with or without METTL3/METTL14 knockdown, or with in vitro transcription, or with and without KSHV
infection which alters epitranscriptomic profiles. Successful completion of the proposed project delivers a
computational toolbox for DNA/RNA modification detection via Nanopore sequencing, provide reference
datasets t...

## Key facts

- **NIH application ID:** 10933502
- **Project number:** 5R01HG013359-02
- **Recipient organization:** CHILDREN'S HOSP OF PHILADELPHIA
- **Principal Investigator:** Kai Wang
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $690,069
- **Award type:** 5
- **Project period:** 2023-09-22 → 2027-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10933502

## Citation

> US National Institutes of Health, RePORTER application 10933502, Novel bioinformatics methods to detect DNA and RNA modifications using Nanopore long-read sequencing (5R01HG013359-02). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10933502. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
