# Removing batch effects in high-throughput biomedical studies

> **NIH NIH R01** · RUTGERS BIOMEDICAL AND HEALTH SCIENCES · 2024 · $284,190

## Abstract

Project Summary/Abstract
Combining high-throughput biomedical data sets from multiple studies is advantageous to increase statistical
power in studies where logistical considerations restrict sample size or require the sequential generation of data.
However, significant technical heterogeneity is commonly observed across multiple batches of data that are
generated from different processing or reagent batches, experimenters, protocols, or profiling platforms. These
so-called batch effects confound true relationships in the data, reducing the power benefits of combining multiple
batches of data, and may even lead to spurious results. Many methods have been proposed to filter technical
heterogeneity from genomic data. These methods are designed to remove batch effects, unmeasured or
“surrogate” variation, or other “unwanted” variation caused by biological or technical sources. Although these
approaches represent impactful advances in the field, there are still significant gaps that need to be addressed
to appropriately filter technical heterogeneity from -omics data and other high-throughput datasets. For example,
many existing methods assume relevant covariates are known or that raw data are generally independent. Some
applications require more specific and direct correction methods, including single cell transcriptomics data that
are often missing cell-type identifiers, microbiome data that are compositional in nature, and imaging and spatial
transcriptomics data that have spatially correlated data points. Furthermore, batch correction introduces
correlation into the adjusted data, which needs to be accounted for in downstream analyses, and most
researchers performing batch correction are unaware of this negative impact and often incorrectly apply
downstream analysis tools. Finally, there is still significant need for additional software tools and benchmark
datasets for evaluating batch effect methods and their efficacy in specific datasets. We propose to develop
algorithms and software to address these specific research gaps facing researchers combining data from
multiple experimental batches.

## Key facts

- **NIH application ID:** 10935948
- **Project number:** 5R01GM127430-07
- **Recipient organization:** RUTGERS BIOMEDICAL AND HEALTH SCIENCES
- **Principal Investigator:** William Evan Johnson
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $284,190
- **Award type:** 5
- **Project period:** 2018-05-01 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10935948

## Citation

> US National Institutes of Health, RePORTER application 10935948, Removing batch effects in high-throughput biomedical studies (5R01GM127430-07). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10935948. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
