Novel bioinformatics methods to detect DNA and RNA modifications using Nanopore long-read sequencing

NIH RePORTER · NIH · R01 · $690,069 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY DNA modifications, such as 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) on DNA, as well as RNA modifications, such as N6-methyladenosine (m6A) on mRNA, have been implicated in gene regulation and human diseases. Synthetic base analogs, such as BrdU, EdU and IdU, have been used as genomic markers to study fundamental biological processes. However, conventional approaches to detect DNA/RNA modifications rely on indirect readout (such bisulfite treatment or immunoprecipitation), cannot assay repetitive regions (due to the use of short reads), and suffer from various technical biases. While direct DNA/RNA sequencing on the Oxford Nanopore platform can address these technical limitations, there is an urgent need to develop reliable bioinformatics methods to detect common DNA/RNA methylations from ionic current data, with the ability to extend to rare forms of modifications. We have years of dedication to the development of computational tools for signal-level analysis of long-read sequencing data. We developed NanoMod which detects synthetically introduced DNA modifications into replicating cells, and DeepMod which uses a deep neural network to predict 5mC directly from ionic current signals from Nanopore sequencing. In the current proposal, we will: (1) Develop LongReadSum, which will be implemented by multi-threaded C++ with modules for diverse formats (FASTA, FASTQ, FAST5, BAM, POD5, etc), for ultrafast quality control (QC) and signal summarization from Nanopore sequencing. The signal summarization procedure generates user-specified feature vectors that can be used by other downstream machine-learning tools for calling modifications. (2) Develop ModDNA, where we will use connectionist temporal classification (CTC) and transformers, two neural network models, to call modified bases such as 5mC, 5hmC and 6mA. Additionally, we will adapt the computational pipeline to reduced representation methylation sequencing (RRMS) data, which enables assaying a human genome in one single MinION flowcell. (3) Develop ModRNA, an integrative model which combines prior genomic features with context-dependent features (for example, enrichment in 3’ end of genes) for both de novo and model-based detection of RNA m6A modifications and other rare modifications. (4) Validate and improve the computational tools via benchmarking data sets. We will perform Nanopore DNA sequencing from cancer samples with paired methylation profiles from clinical diagnostic labs, as well as mouse reference cell lines with or without 5- Aza-2’-deoxycytidine (methylation inhibitor) treatment. We will perform direct mRNA sequencing on reference cell lines with or without METTL3/METTL14 knockdown, or with in vitro transcription, or with and without KSHV infection which alters epitranscriptomic profiles. Successful completion of the proposed project delivers a computational toolbox for DNA/RNA modification detection via Nanopore sequencing, provide reference datasets t...

Key facts

NIH application ID
10933502
Project number
5R01HG013359-02
Recipient
CHILDREN'S HOSP OF PHILADELPHIA
Principal Investigator
Kai Wang
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$690,069
Award type
5
Project period
2023-09-22 → 2027-06-30