ABSTRACT Different magnetic resonance imaging (MRI) scanners and different acquisition parameters can produce very different images for the same patients. This is a significant issue when attempting to use MRIs in a quantitative manner. Multiple studies have shown promise of quantitative analysis of breast MRIs to diagnose breast tumors, predict patient outcomes, assess cancer risk, and even identify genomic signatures of cancers. However, the issue of inhomogeneity of images hampers the progress of the research and clinical implementation of these findings. In many cases one cannot utilize images from different sources to answer a research question. Furthermore, predictive models developed at one institution may not generalize to other institutions. While this is a well-recognized problem, there is currently no solution to it in breast MRI. Some valid efforts have been undertaken in order to address this issue for other organs, predominantly brain. However, the problem has not been solved for those organs neither and limited validation of the existing methods in practical contexts hampers the implementation. Breast is a non-rigid organ with highly variable composition making the harmonization of breast MRIs particularly challenging and making almost all prior harmonization methods developed for brain not applicable. Given the urgent need for harmonization in quantitative research, we propose three harmonization methods that allow for transforming an image acquired using one scanner setup to assume appearance of another scanner setup. We introduce important technical innovations to utilize cutting-edge convolutional neural networks for this task. Additionally, we propose a new approach to the question that has not yet attracted significant systematic consideration: what makes a harmonization algorithm successful or useful? We do not evaluate pixel-to-pixel match between the harmonized image and a reference image which is the typical approach. This approach is impractical in breast imaging since it requires ideally paired images, it does not deal well with expected image noise, and it does not inform about specific limitations of the evaluated harmonization method. We propose an evaluation framework that assesses harmonization algorithms in terms of different practical applications including radiomic analysis and deep learning. The study will be conducted in collaboration between a machine learning scientists (Duke and Yale), a breast MRI physicist (Cornell), a radiologist whose research focuses on MRI (Duke), and a biostatistician (Duke). The proposed harmonization and evaluation methods do not require fully paired data and do not make assumptions about tissue composition. Therefore, they will be applicable across other organs once implemented with appropriate data for the organ. All harmonization and evaluation algorithms along with the data will be made publicly available to spearhead further research on this crucial unsolved research topic.