# Robust Identification and accurate quantification of RNA transcripts on a system wide scale

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA LOS ANGELES · 2020 · $334,584

## Abstract

Project Summary
Next-generation, Illumina RNA sequencing (RNA-seq) is by far the most widely used
assay for investigating animal transcriptomes, and numerous public RNA-seq data sets
have been generated for various biological conditions in multiple species. However,
there remain several barriers in using short RNA-seq reads to accurately identify the
splicing structures and quantify the abundances of full-length RNA transcripts. In this
proposal, we will develop a series of novel statistical and computational methods to
improve the robustness of transcript identification and the accuracy of transcript
quantification from Illumina RNA-seq data. (Aim 1) We will develop a novel screening
method to construct transcript candidates by first detecting sparse splicing structures
from multiple RNA-seq data sets for a given biological condition. These transcript
candidates will significantly reduce the search space of downstream transcript
identification methods and hence improve their precision. (Aim 2) We will develop a
robust transcript identification method to identify novel transcripts in a conservative
manner from RNA-seq data given existing annotations. Our method will be based on
statistical model selection under the Neyman-Pearson paradigm, which will allow users
to control the false positive rate of our identified novel transcripts under any given
threshold with high probability. (Aim 3) We will develop an accurate transcript
quantification method to effectively leverage multiple RNA-seq data sets and to
simultaneously assess the data quality based on low-throughput gold standards and
cross-data similarities. All of these methods will be first used to study transcripts in
mouse macrophage, for which gold standard qPCR and full length cDNA sequences will
be generated for training and method validation. The methods will then be more broadly
tested in other biological systems where suitable gold standard data is available. Our
methods and software will significantly facilitate the use of Illumina RNA-seq data for
gene expression studies at the transcript level, increase reproducibility of scientific
discoveries from transcriptomic studies, and improve our understanding of gene
expression mechanisms in various biological conditions.

## Key facts

- **NIH application ID:** 9974525
- **Project number:** 5R01GM120507-05
- **Recipient organization:** UNIVERSITY OF CALIFORNIA LOS ANGELES
- **Principal Investigator:** Jingyi Jessica Li
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $334,584
- **Award type:** 5
- **Project period:** 2016-09-01 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9974525

## Citation

> US National Institutes of Health, RePORTER application 9974525, Robust Identification and accurate quantification of RNA transcripts on a system wide scale (5R01GM120507-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9974525. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
