Harmonization of Multi-Site Neuroimaging Data from Complex Study Designs

NIH RePORTER · NIH · R01 · $820,540 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY The number of large-scale multi-site neuroimaging studies has skyrocketed due to growing investments by federal governments and private entities interested in brain development, aging, and pathology. This has led to the accumulation of vast amounts of magnetic resonance imaging (MRI) data. Such data have been acquired with an increasing degree of technical harmonization of scanning protocols, which has been beneficial for reducing inter-site differences. However, extensive evidence from our group and others emphasizes that even under after careful technical harmonization, site effects dwarf biological effects of interest. Over the past five years, there has been an explosion of interest in statistical harmonization methods to address this problem. As part of a highly-successful first project period, our group has pioneered the translation of tools from statistical genomics – such as the ComBat family of methods – to neuroimaging data. These widely adopted methods use empirical Bayes to correct for site effects based on the means, variances, and covariances of imaging features. However, in this context, two key challenges have arisen: missing phenotypes and nonlinear effects. First, as precision medicine turns to neuroimaging, harmonization in the context of diagnostic and prognostic biomarkers has become a central issue. Our group has shown that harmonization methods that aim to preserve disease information in imaging data are critical for scientific rigor and reproducibility. However, such information is not available without knowledge of the phenotype of interest. In these settings, translational investigators are caught in a catch-22: harmonization is not possible without knowing that individual's diagnosis, which is exactly the information targeted for prediction. To address this, here we propose to develop new methods that stochastically impute the phenotype and assess the uncertainty in the predicted harmonization. Second, deep learning has revolutionized the field of predictive modeling due to its sensitivity to complex nonlinear effects. However, its flexibility makes it highly sensitive to diverse technical biases. While there have been several initial forays in using deep learning for harmonization, currently available approaches have critical limitations – such as an inability to address confounding. In this proposal, we will develop novel hybrid deep statistical methods for improved harmonization to mitigate nonlinear scanner effects. We will extend these developments to the setting of longitudinally acquired imaging data, apply these in two large multi-center cohort studies, and release user- friendly software packages for the imaging science community.

Key facts

NIH application ID: 10901489
Project number: 2R01MH123550-05A1
Recipient: UNIVERSITY OF PENNSYLVANIA
Principal Investigator: Russell Takeshi Shinohara
Activity code: R01
Funding institute: NIH
Fiscal year: 2024
Award amount: $820,540
Award type: 2
Project period: 2020-06-10 → 2029-04-30