# Bioinformatics platform for Hybrid-Seq transcriptome data analysis

> **NIH NIH R01** · OHIO STATE UNIVERSITY · 2020 · $365,058

## Abstract

PROJECT SUMMARY / ABSTRACT
While RNA-Seq experiments based on Second Generation Sequencing (SGS) short reads have enabled
remarkable advances in our ability to analyze the transcriptome, a few fundamental problems remain unsolved
due to the high complexity of the genome and the inability to identify combinatorial genomic events. Third
Generation Sequencing (TGS), including PacBio sequencing and Oxford Nanopore Technologies (ONT) which
provide much longer reads (1-100kb), has the potential to overcome these problems. However, the current
high-cost and laborious strategy of only using PacBio data is not practical for mid-size labs. Hybrid sequencing
(“Hybrid-Seq”), which integrates TGS and SGS data, has emerged as an approach to address the limitations
associated with analysis of short SGS reads and the error rate of TGS reads. However, tools to analyze Hybrid-
Seq transcriptome data are not currently available because the majority of methodological developments have
focused on Hybrid-Seq genomic data. In order to improve our understanding of transcriptome complexity, we
will develop a comprehensive Hybrid-Seq platform of novel statistical and computational methods to analyze
TGS long reads with the aid of SGS short reads, and to identify gene isoforms, fusion transcripts and allele-
specific expression (ASE). The proposed studies build on our published and preliminary work where we
developed methods for error correction for TGS data and detection of novel gene isoforms, which were applied
to Hybrid-Seq transcriptome data from human embryonic stem cells (hESCs). In Aim 1, we will develop
computational and statistical approaches to identify and quantify gene isoforms. In Aim 2, we will develop
computational methods to discover fusion transcripts. In Aim 3, we will determine the haplotypes of gene alleles
and quantify ASE using Hybrid-Seq data. The methods developed in this proposal will be integrated into a
software platform for analysis of Hybrid-Seq transcriptome data. This user-friendly bioinformatics platform will
have important positive impacts by providing an unprecedented opportunity for comprehensive transcriptome
profiling, with broad applicability and higher resolution. In addition, these tools will enable more researchers to
apply Hybrid-Seq to their transcriptome studies.

## Key facts

- **NIH application ID:** 9976556
- **Project number:** 5R01HG008759-06
- **Recipient organization:** OHIO STATE UNIVERSITY
- **Principal Investigator:** Kin Fai Au
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $365,058
- **Award type:** 5
- **Project period:** 2016-09-09 → 2022-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9976556

## Citation

> US National Institutes of Health, RePORTER application 9976556, Bioinformatics platform for Hybrid-Seq transcriptome data analysis (5R01HG008759-06). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/9976556. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
