# Domain-Knowledge Informed Deep Learning for Early Detection of Pancreatic Cancer

> **NIH NIH R21** · COLUMBIA UNIVERSITY HEALTH SCIENCES · 2022 · $222,661

## Abstract

PROJECT SUMMARY
The goal of this project is to leverage deep-learning algorithms on Electronic Health Records (EHRs) to
improve early detection of pancreatic ductal adenocarcinoma (PDAC), a malignancy with high mortality and
morbidity. Although numerous risk factors have been identified, PDAC is most often found in later stages when
effective treatments are not feasible or their survival benefit is limited. In this R21, we aim to develop novel
structured methodologies for systematically incorporating feature grouping strategy from expert domain
knowledge into the training procedure of deep-learning algorithms for improving PDAC diagnosis. The
overarching hypothesis for this study is that the groups of highly correlated variables will combine to form
superior and interpretable predictors compared to individual clinical variables (current proposal).
Furthermore, these new predictors represented by the group of related data will be useful for other
downstream tasks such as risk factor identification via causal discovery (future research).
The proposed research presents an innovative approach towards unifying human and artificial intelligence,
using explainable algorithms to build interpretable prediction models, in contrast to conventional deep-learning
algorithms which are non-traceable by humans due to their black-box nature.
An optimal strategy for creating composite (grouped) variables should maximize both predictive power as well
as human-interpretability. We will thus explore a variety of grouping strategies relying heavily on human-expert
knowledge (e.g. clinical workflows) as well as auto-correlation tests. An effective grouping strategy will allow
our prediction model to learn the relative importance of both individual measurements as well as interpretable
groups of measurements in predicting PDAC. Examples in the literature show that such grouped predictors
often have superior predictive power compared to their individual components, which can be attributed to the
mutual information shared within the group. Different types of explainable (attention) neural networks may also
be applied depending on the group characteristics to further improve interpretability as well as prediction
accuracy.
We believe that similar methodologies applied to predictive modeling in healthcare data have the potential to
fundamentally advance clinical decision making with improved model interpretability. The success of this
proposal will be leveraged in a larger ongoing project which aims to establish new causal relationships
between various risk factors associated with PDAC. This involves an advanced graph-based approach for
building interpretable models. Our direct application of causal discoveries in the future research will be a
program for collecting patient-generated health data (PGHD) for PDAC early diagnosis.

## Key facts

- **NIH application ID:** 10458067
- **Project number:** 5R21CA265400-02
- **Recipient organization:** COLUMBIA UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** Chin Hur
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $222,661
- **Award type:** 5
- **Project period:** 2021-07-28 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10458067

## Citation

> US National Institutes of Health, RePORTER application 10458067, Domain-Knowledge Informed Deep Learning for Early Detection of Pancreatic Cancer (5R21CA265400-02). Retrieved via AI Analytics 2026-06-12 from https://api.ai-analytics.org/grant/nih/10458067. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
