# Cancer Genomics: Integrative and Scalable Solutions in R/Bioconductor

> **NIH NIH U24** · GRADUATE SCHOOL OF PUBLIC HEALTH AND HEALTH POLICY · 2022 · $318,602

## Abstract

Project Summary
Bioconductor is an ecosystem of more than 2,000 open-source software packages for the reproducible
bioinformatics analysis of various types of genomic data. Aim 1 of our parent grant, “Cancer Genomics:
Integrative and Scalable Solutions in R/Bioconductor” (7U24CA180996), develops and maintains
R/Bioconductor data structures for representation, downstream software development, and analysis of
multimodal cancer datasets. Aim 3 of our parent grant establishes ExperimentHub web resources for the
curation, distribution, maintenance, discoverability, and usability of cancer data resources for the
R/Bioconductor community. This proposal targets hundreds of primarily cancer-focused genomic and
metagenomic datasets that are optimized for R/Bioconductor-based usage and contain significant value-added
over primary sources in the form of harmonization and manual curation, but for which substantial domain and
Bioconductor-specific expertise is currently required to translate into formats suitable for widely used AI/ML
softwares. First, it creates the Bioconductor Machine Learning Repository for Omics by translating existing
R/Bioconductor versions of TCGA, cBioPortal, metagenomics, and other datasets. Second, in order to assess
representation and generalizability of any models developed, it employs manual curation to uniformly annotate
key characteristics of each study cohort including race/ethnicity, sex as a biological variable, geographical
location, and recruitment period. Finally, it provides runnable documented examples of the import and use of
these datasets in TensorFlow, PyTorch, and scikit-learn. In total, this proposal will produce the first large-scale,
platform-independent, AI/ML-ready data repository for diverse and highly curated omics data. Thorough
annotation on minority status of the studies and samples in our repository will facilitate the identification of
biases and health disparities for marginalized populations.

## Key facts

- **NIH application ID:** 10594231
- **Project number:** 3U24CA180996-10S1
- **Recipient organization:** GRADUATE SCHOOL OF PUBLIC HEALTH AND HEALTH POLICY
- **Principal Investigator:** Martin T Morgan
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $318,602
- **Award type:** 3
- **Project period:** 2021-09-01 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10594231

## Citation

> US National Institutes of Health, RePORTER application 10594231, Cancer Genomics: Integrative and Scalable Solutions in R/Bioconductor (3U24CA180996-10S1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10594231. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
