# Removing batch effects in genomic and epigenomic studies

> **NIH NIH R01** · BOSTON UNIVERSITY MEDICAL CAMPUS · 2020 · $325,875

## Abstract

Combining genomic data sets from multiple studies is advantageous to increase statistical power in studies
where logistical considerations restrict sample size or require the sequential generation of data. However,
significant technical heterogeneity is commonly observed across multiple batches of data that are generated
from different batches, experiments, or profiling platforms. These so called batch effects often confound true
biological relationships in the data, reducing the power benefits of combining multiple batches of data, and may
even lead to spurious results. Many methods have been proposed to filter technical heterogeneity and batch
effects from genomic data. However, there are still significant gaps that need to be addressed to more
appropriately filter technical heterogeneity from genomic datasets. For example, existing approaches assume
bell-shaped, symmetric data, which are not appropriate for modern sequencing count data. Furthermore, there
are no current approaches for batch effects genomic data that measure features at a refined level, for example
epigenetic sequencing data, where nearby features are likely to be closely correlated. Current batch
adjustment methods are dependent of the data batches on hand, meaning that if additional batches of data
were added to the analysis, the batch adjustments would need to be reapplied, resulting in different adjusted
genomic data values. In addition, batch correction usually introduces correlation into the adjusted data, which
needs to be accounted for in downstream analyses; most researchers performing batch correction before
additional analysis steps are unaware of this negative impact, and as a result often incorrectly apply
downstream analysis tools. Finally, it is not always clear which batch adjustment methods should be applied in
each particular case, so a thorough evaluation is required before an appropriate batch correction strategy can
be devised. These gaps highlight the need for new statistical methods and interactive visualization software to
facilitate the needs of researchers in this area. We propose to develop algorithms and software to address
these specific research gaps facing researchers combining data from multiple experimental batches.

## Key facts

- **NIH application ID:** 9926913
- **Project number:** 5R01GM127430-03
- **Recipient organization:** BOSTON UNIVERSITY MEDICAL CAMPUS
- **Principal Investigator:** William Evan Johnson
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $325,875
- **Award type:** 5
- **Project period:** 2018-05-01 → 2022-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9926913

## Citation

> US National Institutes of Health, RePORTER application 9926913, Removing batch effects in genomic and epigenomic studies (5R01GM127430-03). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/9926913. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
