# Analysis of genomics datasets at a massive scale

> **NIH NIH R01** · JOHNS HOPKINS UNIVERSITY · 2021 · $461,663

## Abstract

Understanding both normal and pathogenic patterns of human gene expression can help shed light on the biology
of human disease. Thousands of studies have now been undertaken measuring gene expression in different
tissues and diseases. By aggregating and analyzing all available human RNA-sequencing data using a high
powered computational and statistical framework, we will provide a transformative resource for characterizing
human gene expression patterns including rare transcriptional events, cellular networks, and genetic variation.
In Aim 1 we propose to uniformly process all publicly available human transcriptome sequencing data and collect it
into a publicly available resource called the Transcriptome Aggregation Resource (TAR); at least 150,000 samples
will be processed using cloud computing. This resource will contain single-base resolution maps of expression,
de novo mapped exon-exon splice junctions and allele speciﬁc expression across a set of common variations.
We will supplement the expression data with cleaned and predicted metadata. In Aim 2 we will develop statistical
and computational methods necessary to fully realize the potential of this resource. Speciﬁcally we will remove
unwanted variation at scale and develop mixture models to summarize the large data resource at the gene,
junction and single base levels. In Aim 3 we will analyze this resource to address fundamental questions in
expression biology, include a systematic study of expression outliers and allele speciﬁc expression at the gene,
junction and single base resolution. We will infer well-powered co-expression networks over both expressed
genes and splicing patterns.
This work will contribute signiﬁcantly to our understanding of gene expression by analyzing genomics data at a
massive scale.

## Key facts

- **NIH application ID:** 10223343
- **Project number:** 5R01GM121459-05
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** Kasper Daniel Hansen
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $461,663
- **Award type:** 5
- **Project period:** 2017-08-01 → 2023-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10223343

## Citation

> US National Institutes of Health, RePORTER application 10223343, Analysis of genomics datasets at a massive scale (5R01GM121459-05). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10223343. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
