# A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration

> **NIH NIH U24** · UNIVERSITY OF TX MD ANDERSON CAN CTR · 2022 · $395,243

## Abstract

* * * * PROJECT SUMMARY * * * *
Abstract: Technical batch effects pose a fundamental challenge to quality control and reproducibility of even
single-laboratory research projects, but the possibilities for serious error are greatly magnified in complex, multi-
institutional enterprises such as the cancer molecular profiling projects being undertaken by the NCI Center for
Cancer Genomics (CCG). To aid in detection, quantitation, interpretation, and (when appropriate) correction for
technical batch effects in such data, we have developed the MBatch software system. MBatch proved
indispensable for quality-control “surveillance” of data in The Cancer Genome Atlas (TCGA) and ongoing CCG
projects. But detecting and quantitating batch effects (or trend effects or statistical outliers) are just the first steps
in a process. The next steps involve detective work in collaboration with those who generated the data, drawing
upon expertise in integrative analysis across data types, pathways, and systems-level biology. That detective
work usually succeeds in diagnosing the cause of a batch effect as technical or biological. If technical, then
computational methods to ameliorate the batch effect can be applied (judiciously).
 The primary aim of the proposed Genome Data Analysis Center (GDAC) is to continue to translate that
successful quality-control model to the CCG’s other current and future large-scale molecular profiling projects
We will be ready to do that on Day 1. We will continue to enhance and extend the power of MBatch and
incorporate a number of innovative new algorithms, tools, and interactive visualizations into it (OmicPioneer-sc,
MutBatch, CarDEC, and CorNet). Evaluating and correcting batch effects is a complex process, so we will
collaborate with other GDACs and data generating centers to determine the influence of artifacts on any analysis
results they produce. The second aim is to contribute and enhance additional competencies. We are prepared
to (i) provide integrated cluster solutions to segregate cases into biologically relevant groups; (ii) provide tools
and expertise for high-level visualization of omic data (including single-cell data); and (iii) analyze RPPA
proteomic data from the subset of projects that generate such data. Our final aim is to communicate results and
distribute corrected data back to other network members, project stakeholders, and the scientific community.
 We bring a number of assets to the table, including multidisciplinary expertise in bioinformatics, biostatistics,
software engineering, cancer biology and cancer medicine; PIs with a combined 40+ years of experience in
molecular profiling of cancers; expertise gained in 10 years of doing the batch effects surveillance for TCGA and
other CCG projects; a highly professional software engineering team with a track record of producing high-end
bioinformatics tools; extensive computing resources, including one of the most powerful academic clusters in the
world; and clos...

## Key facts

- **NIH application ID:** 10492545
- **Project number:** 5U24CA264006-02
- **Recipient organization:** UNIVERSITY OF TX MD ANDERSON CAN CTR
- **Principal Investigator:** Rehan Akbani
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $395,243
- **Award type:** 5
- **Project period:** 2021-09-22 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10492545

## Citation

> US National Institutes of Health, RePORTER application 10492545, A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration (5U24CA264006-02). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10492545. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
