# ARCHS4: Massive Mining of Publicly Available RNA Sequencing Data

> **NIH NIH U24** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2022 · $790,918

## Abstract

SUMMARY
Many cancer-related independent studies that employ bulk and single cell RNA-seq remain under reused due to
their lower findability, accessibility, interoperability, and reusability. The data from these studies can be found in
the Gene Expression Omnibus (GEO) but it is provided mostly as raw FASTQ files with non-uniform metadata
annotations. While some studies provide aligned reads files, these are processed non-uniformly. This
shortcoming makes it difficult to query and integrate this data across studies and with additional external data.
To bridge the gap that currently exists between RNA-seq data generation and RNA-seq data processing and
reuse, we developed the resource All RNA-seq and ChIP-Seq Sample and Signature Search (ARCHS4).
ARCHS4 provides processed RNA-seq data from GEO to support retrospective data analyses and reuse.
ARCHS4 caters to users with different levels of computational expertise and has been already employed for
many post-hoc analyses and projects. The goals go far beyond just providing cancer researchers with direct
access to RNA-seq data through a web-based user interface. We plan to transform other transcriptomics data
into RNA-seq-like profiles with Deep Learning, identify pathogenic sequences in human RNA-seq samples,
identify short variants from RNA-seq reads, predict gene function from co-expression data including ways to
modulate the expression of long non-coding RNAs with small molecules, and most importantly, using the
ARCHS4 cost-effective infrastructure, continue to provide a free FASTQ alignment service to the community.

## Key facts

- **NIH application ID:** 10527721
- **Project number:** 1U24CA264250-01A1
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** Avi Ma'ayan
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $790,918
- **Award type:** 1
- **Project period:** 2022-09-01 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10527721

## Citation

> US National Institutes of Health, RePORTER application 10527721, ARCHS4: Massive Mining of Publicly Available RNA Sequencing Data (1U24CA264250-01A1). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10527721. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
