# Harmonization of Multi-Site Neuroimaging Data from Complex Study Designs

> **NIH NIH R01** · UNIVERSITY OF PENNSYLVANIA · 2024 · $820,540

## Abstract

PROJECT SUMMARY
The number of large-scale multi-site neuroimaging studies has skyrocketed due to growing investments by
federal governments and private entities interested in brain development, aging, and pathology. This has led to
the accumulation of vast amounts of magnetic resonance imaging (MRI) data. Such data have been acquired
with an increasing degree of technical harmonization of scanning protocols, which has been beneficial for
reducing inter-site differences. However, extensive evidence from our group and others emphasizes that even
under after careful technical harmonization, site effects dwarf biological effects of interest. Over the past five
years, there has been an explosion of interest in statistical harmonization methods to address this problem. As
part of a highly-successful first project period, our group has pioneered the translation of tools from statistical
genomics – such as the ComBat family of methods – to neuroimaging data. These widely adopted methods use
empirical Bayes to correct for site effects based on the means, variances, and covariances of imaging features.
However, in this context, two key challenges have arisen: missing phenotypes and nonlinear effects. First, as
precision medicine turns to neuroimaging, harmonization in the context of diagnostic and prognostic biomarkers
has become a central issue. Our group has shown that harmonization methods that aim to preserve disease
information in imaging data are critical for scientific rigor and reproducibility. However, such information is not
available without knowledge of the phenotype of interest. In these settings, translational investigators are caught
in a catch-22: harmonization is not possible without knowing that individual's diagnosis, which is exactly the
information targeted for prediction. To address this, here we propose to develop new methods that stochastically
impute the phenotype and assess the uncertainty in the predicted harmonization. Second, deep learning has
revolutionized the field of predictive modeling due to its sensitivity to complex nonlinear effects. However, its
flexibility makes it highly sensitive to diverse technical biases. While there have been several initial forays in
using deep learning for harmonization, currently available approaches have critical limitations – such as an
inability to address confounding. In this proposal, we will develop novel hybrid deep statistical methods for
improved harmonization to mitigate nonlinear scanner effects. We will extend these developments to the setting
of longitudinally acquired imaging data, apply these in two large multi-center cohort studies, and release user-
friendly software packages for the imaging science community.

## Key facts

- **NIH application ID:** 10901489
- **Project number:** 2R01MH123550-05A1
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** Russell Takeshi Shinohara
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $820,540
- **Award type:** 2
- **Project period:** 2020-06-10 → 2029-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10901489

## Citation

> US National Institutes of Health, RePORTER application 10901489, Harmonization of Multi-Site Neuroimaging Data from Complex Study Designs (2R01MH123550-05A1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10901489. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
