# Developing and Evaluating Multi-Modal Clinical Diagnostic Reasoning Models for Automated Diagnosis Generation

> **NIH NIH R00** · UNIVERSITY OF COLORADO DENVER · 2024 · $248,999

## Abstract

PROJECT ABSTRACT
Diagnostic errors affect 12 million patients in the U.S. and contribute to 80,000 deaths per year. The main causes
for diagnostic errors include cognitive biases introduced by healthcare providers, miscommunication between
healthcare teams, lack of access to key data, and not recognizing time-sensitive data in the electronic health
record (EHR). The cognitive burden from information overload in the EHR cause clinicians to take decisional
shortcuts with biased heuristics and miss critical data in the EHR, leading to missed opportunities for timely and
accurate diagnoses. Artificial Intelligence (AI) and clinical Natural Language Processing (cNLP) provide
opportunity to help understand medical text and can automate EHR analysis, pointing to the promising direction
of invoking medical knowledge and clinical experience as humans do. However, the majority of the cNLP tasks
are not designed for bedside application to generate diagnoses and augment bedside decision-making. We have
have gathered preliminary data and designed cNLP benchmark tasks for clinical diagnostic reasoning. Our tasks
address key cognitive processes to build models in this proposal that can synthesize EHR data to generate
diagnoses that align with evidence-based medicine and medical knowledge representation. The proposal aims
to develop novel cNLP models that understand and integrate multi-modal EHR data, and conduct reasoning over
a large-scale medical knowledge base to build a model that provides higher accuracy than current neural network
models. I will first develop a multi-modal generative model that reads in both structured and unstructured EHR
data to output diagnoses using a two-stage training process (Aim 1). In a separate aim, I will construct a
knowledge base using a neural symbolic approach from medical concepts and relations sourced from the
National Library of Medicine's Unified Medical Language System (UMLS). The knowledge base will be part of
the model to generate diagnoses given the information from a daily care note collected in the EHR (Aim 2). The
third aim will design and pilot a clinical diagnostic decision support system using human-centered design
principles. The best models from Aims 1 and 2 will be evaluated for diagnostic accuracy by clinicians in the
system using previously validated instruments for patient safety and diagnostic error (Aim 3). Completion of the
aims will inform future clinical studies on developing NLP-driven clinical decision support tools for reducing
diagnostic error. I will complete this project under the direct supervision of my co-mentors and advisors who have
expertise in developing clinical neural language models, implementation of AI-driven tools in health systems,
and clinical decision support systems with augmented intelligence. Together, this multidisciplinary team brings
nationally renowned expertise in clinical informatics with a track record of successful mentorship. My 4-year
proposal with intensive mentor...

## Key facts

- **NIH application ID:** 11170047
- **Project number:** 4R00LM014308-02
- **Recipient organization:** UNIVERSITY OF COLORADO DENVER
- **Principal Investigator:** Yanjun Gao
- **Activity code:** R00 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $248,999
- **Award type:** 4N
- **Project period:** 2024-09-01 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/11170047

## Citation

> US National Institutes of Health, RePORTER application 11170047, Developing and Evaluating Multi-Modal Clinical Diagnostic Reasoning Models for Automated Diagnosis Generation (4R00LM014308-02). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/11170047. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
