Distributed Learning of Deep Learning Models for Cancer Research

NIH RePORTER · NIH · U01 · $394,824 · view on reporter.nih.gov ↗

Abstract

Project Summary Deep learning methods are showing great promise for advancing cancer research and could potentially improve clinical decision making in cancers such as primary brain glioma, where deep learning models have recently shown promising results in predicting isocitrate dehydrogenase (IDH) mutation and survival in these patients. A major challenge thwarting this research, however, is the requirement for large quantities of labeled image data to train deep learning models. Efforts to create large public centralized collections of image data are hindered by barriers to data sharing, costs of image de-identification, patient privacy concerns, and control over how data are used. Current deep learning models that are being built using data from one or a few institutions are limited by potential overfitting and poor generalizability. Instead of centralizing or sharing patient images, we aim to distribute the training of deep learning models across institutions with computations performed on their local image data. Although our preliminary results demonstrate the feasibility of this approach, there are three key challenges to translating these methods into research practice: (1) data is heterogeneous among institutions in the amount and quality of data that could impair the distributed computations, (2) there are data security and privacy concerns, and (3) there are no software packages that implement distributed deep learning with medical images. We tackle these challenges by (1) optimizing and expanding our current methods of distributed deep learning to tackle challenges of data variability and data privacy/security, (2) creating a freely available software system for building deep learning models on multi- institutional data using distributed computation, and (3) evaluating our system to tackle deep learning problems in example use cases of classification and clinical prediction in primary brain cancer. Our approach is innovative in developing distributed deep learning methods that will address variations in data among different institutions, that protect patient privacy during distributed computations, and that enable sites to discover pertinent datasets and participate in creating deep learning models. Our work will be significant and impactful by overcoming critical hurdles that researchers face in tapping into multi-institutional patient data to create deep learning models on large collections of image data that are more representative of disease than data acquired from a single institution, while avoiding the hurdles to inter-institutional sharing of patient data. Ultimately, our methods will enable researchers to collaboratively develop more generalizable deep learning applications to advance cancer care by unlocking access to and leveraging huge amounts of multi-institutional image data. Although our clinical use case in developing this technology is primary brain cancer, our methods will generalize to all cancers, as well as to othe...

Key facts

NIH application ID: 10018827
Project number: 5U01CA242879-02
Recipient: STANFORD UNIVERSITY
Principal Investigator: Jayashree Kalpathy-Cramer
Activity code: U01
Funding institute: NIH
Fiscal year: 2020
Award amount: $394,824
Award type: 5
Project period: 2019-09-16 → 2022-08-31