# Computational Methods to Characterize Alternative Splicing from Massive Collections of RNA-seq Data

> **NIH NIH R01** · JOHNS HOPKINS UNIVERSITY · 2021 · $363,164

## Abstract

SUMMARY
Alternative splicing (AS) is a gene regulatory mechanism with important roles in human biology and disease.
High throughput sequencing of RNA (RNA-seq) is making it possible to survey the expressed genes and their
alternative splicing variations in a wide variety of cellular conditions. However, the short reads are challenging
to analyze, demanding highly sophisticated computational methods that can extract meaningful AS information
efficiently, accurately, and in a comprehensive way. While there has been great progress so far, current
methods based on assembling the short reads into transcript annotations have reached a plateau. We propose
two innovations that can help overcome the limits. The first is one-step simultaneous analyses of multiple
samples in an RNA-seq collection, in contrast with the current two-step approach that analyzes each sample
separately and then merges the results. The second is to create and interrogate assembly-free representations
of AS. The project will design a suite of tools that will leverage the latent information in large collections of
samples and from heterogeneous data types to build complete and accurate AS signatures of tissues and cell
types, and to elucidate the regulatory circuitry of AS and its functional implications. Aim 1 will develop a high-
performance multi-sample transcript assembly tool, combining subexon graph representations of genes and
AS variations, statistical methods for improved feature detection, and search space reduction techniques for
efficient sample processing. Aim 2 will build highly efficient and accurate feature selection tools to detect and
characterize assembly-free AS variations (subexons and introns), simultaneously from collections of RNA-seq
samples. It will combine novel regularized programs with complex models of intronic `noise' and other RNA-seq
confounders, and enable analyses of differential splicing and to identify individual and group-specific variations.
Lastly, Aim 3 will develop a system to comprehensively model the regulatory and functional circuitry of AS and
the effects of mutations, starting from deep learning models of sequences and alignments and integrating
expression, sequence, epigenetic and mutation data across tissues, cell types and conditions. We will
rigorously test and evaluate all tools in simulations and on large public data sets, as well as on thyroid and
head and neck cancer data provided by our collaborators, and we will experimentally validate random subsets
of predictions with capillary electrophoresis and qRT-PCR. Collectively, the concepts, methods and tools will
establish a new framework for analyzing RNA-seq data that can efficiently tackle the `big data' challenges,
leading to more complete discovery and annotation of AS structure and function in human health and disease.

## Key facts

- **NIH application ID:** 10218209
- **Project number:** 5R01GM129085-03
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** Liliana D Florea
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $363,164
- **Award type:** 5
- **Project period:** 2019-09-20 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10218209

## Citation

> US National Institutes of Health, RePORTER application 10218209, Computational Methods to Characterize Alternative Splicing from Massive Collections of RNA-seq Data (5R01GM129085-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10218209. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*