Analysis of genomics datasets at a massive scale

NIH RePORTER · NIH · R01 · $463,213 · view on reporter.nih.gov ↗

Abstract

Understanding both normal and pathogenic patterns of human gene expression can help shed light on the biology of human disease. Thousands of studies have now been undertaken measuring gene expression in different tissues and diseases. By aggregating and analyzing all available human RNA-sequencing data using a high powered computational and statistical framework, we will provide a transformative resource for characterizing human gene expression patterns including rare transcriptional events, cellular networks, and genetic variation. In Aim 1 we propose to uniformly process all publicly available human transcriptome sequencing data and collect it into a publicly available resource called the Transcriptome Aggregation Resource (TAR); at least 150,000 samples will be processed using cloud computing. This resource will contain single-base resolution maps of expression, de novo mapped exon-exon splice junctions and allele specific expression across a set of common variations. We will supplement the expression data with cleaned and predicted metadata. In Aim 2 we will develop statistical and computational methods necessary to fully realize the potential of this resource. Specifically we will remove unwanted variation at scale and develop mixture models to summarize the large data resource at the gene, junction and single base levels. In Aim 3 we will analyze this resource to address fundamental questions in expression biology, include a systematic study of expression outliers and allele specific expression at the gene, junction and single base resolution. We will infer well-powered co-expression networks over both expressed genes and splicing patterns. This work will contribute significantly to our understanding of gene expression by analyzing genomics data at a massive scale.

Key facts

NIH application ID
9978590
Project number
5R01GM121459-04
Recipient
JOHNS HOPKINS UNIVERSITY
Principal Investigator
Kasper Daniel Hansen
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$463,213
Award type
5
Project period
2017-08-01 → 2022-07-31