Early Detection of Pancreatic Cancer with Human-in-the-Loop Deep Learning

NIH RePORTER · NIH · K25 · $178,610 · view on reporter.nih.gov ↗

Abstract

The goal of this project is to leverage deep-learning algorithms on Electronic Health Records (EHRs) to improve the early detection of pancreatic ductal adenocarcinoma (PDAC), a malignancy with high mortality and morbidity. Compared to other major types of cancer in the U.S. (e.g., colorectal, prostate, breast, lung), PDAC has a uniquely high mortality with a 5-year survival of only 3%, largely due to the late stage of diagnosis and the aggressiveness of the malignancy. In this K25 application, we aim to develop novel structured methodologies for systematically incorporating human expert domain knowledge into the training procedure of deep-learning algorithms (“Human-in-the-Loop” approach) for improving the early detection of PDAC. The overarching hypothesis for this study, which has already been demonstrated in numerous other contexts, is that the “Human-in-the-Loop” approach imbues the deep learning in the PDAC prediction model with expert domain knowledge (e.g., clinical work-flows, statistical knowledge) to result in improved model performance as well as interpretability of results. The proposed research will accomplish three aims. In Aim 1, we will build preprocessing pipelines that are generalizable for analyzing multimodal data from different data collection systems (national, state, and institutional). The resultant pipelines will provide multimodal EHR deep embeddings optimized for deep learning applications in Aim 2 & 3. In Aim 2, we will investigate feature grouping strategies relying on information from clinical workflows and incorporate them into the deep learning prediction model. The proposed model will provide new clinical predictors represented by single or composite variables according to the grouping strategies. Examples in the literature show that such grouped predictors consistently have superior predictive power compared to their individual components. In Aim 3 we will study causal relationships between patient variables (including composite variables discovered in Aim 2) and the PDAC risk. We will use the Human-in-the-Loop approach where the possible causal relationships suggested from the deep learning model will be evaluated and corrected by human experts (e.g., clinicians, statisticians), to construct faithful Causal Bayesian Networks (CBNs) visualizing causal pathways from patient variables to PDAC risk. The resultant CBNs will be used as a framework for developing a risk assessment questionnaire to collect Patient- Generated Health Data (PGHD), which will be further evaluated and optimized in my future R01 focused on the development of a mobile survey application to efficiently collect PGHD and improve the early detection of PDAC. This proposal can potentially lead to new criteria for identifying high-risk patients for PDAC and inform targeted screening practices, that will likely be generalizable to other types of cancer.

Key facts

NIH application ID
10834155
Project number
5K25CA267052-02
Recipient
COLUMBIA UNIVERSITY HEALTH SCIENCES
Principal Investigator
Jiheum Park
Activity code
K25
Funding institute
NIH
Fiscal year
2024
Award amount
$178,610
Award type
5
Project period
2023-05-01 → 2028-04-30