Development of methods for transcript quantification and differential expression analysis using long-read sequencing technologies.

NIH RePORTER · NIH · R21 · $35,145 · view on reporter.nih.gov ↗

Abstract

The rapid development of Third Generation, Long Read Sequencing (LRS) platforms such as Pacbio and Oxford Nanopore Technologies (ONT) have enabled increasing precision and higher-throughput sequencing of transcripts. Long reads can produce full-length transcript sequences, overcoming much of the uncertainty of short-read methods to accurately define transcripts, particularity for those genes with alternative splicing (more than 90% of human genes), for which short read sequencing has thus far proved difficult. LRS is therefore the natural choice for the study of the expression of transcript variants and of the role of alternative isoforms in disease and development. While the first iterations of the long-read technologies did not produce enough reads to quantify more than the highest expressed transcripts, the current sequencing depth of up to 8 million reads per SMRT cells on the Sequel 2 platforms promises reliable quantifiability for more modestly expressed genes. Also significant yield increases have been reported for Nanopore. This suggests that LRS may have reached sufficient throughput to enable accurate quantification of gene expression and differential expression analyses. LRS transcriptomics data have, however, specific properties that are absent in other transcriptomics technologies, such are partial matches of reference transcript models. Therefore specific methods for quantification and statistical analysis need to be developed. In this Project, we aim to characterize in detail the data distribution in long reads data, propose strategies to deal with their particular read uncertainty issues and develop new strategies for differential expression analysis. The overarching goal is to create the analytical framework to fully leverage LRS technologies for the study of isoform dynamics in relation of biomedical relevant questions.

Key facts

NIH application ID
10041221
Project number
1R21HG011280-01
Recipient
UNIVERSITY OF FLORIDA
Principal Investigator
Ana Victoria Conesa Cegarra
Activity code
R21
Funding institute
NIH
Fiscal year
2020
Award amount
$35,145
Award type
1
Project period
2020-09-01 → 2021-05-06