# Joint learning methods for event and relation extraction from clinical narratives

> **NIH NIH R15** · GEORGE MASON UNIVERSITY · 2022 · $424,875

## Abstract

Project Summary
Electronic health records (EHRs), detailing patient status and all aspects of clinical care, can greatly facilitate
quality improvement and surveillance initiatives as well as revolutionize clinical research. The unstructured
clinical narratives in EHRs document critical information, including medical problems, treatments, and
diagnostic tests as well as the rationale for care and outcomes. Natural Language Processing (NLP) and
Information Extraction (IE) systems target the identification of such critical information from clinical narratives.
These systems extract clinical concepts such as medical problems, treatments, and tests, determine the
attributes of these concepts to get clarity on their presence/absence and other details in a patient; and identify
the interactions of these concepts with each other in terms of predefined relations. Most clinical NLP systems
that tackle the extraction of this information are pipeline based: i.e., extraction of clinical concepts precedes the
determination of their attributes and the determination of relations between clinical concepts. While producing
promising results, these systems suffer from two major limitations: (1) when faced with data imbalance, they
perform best on the more prevalent classes of observations found in the data and suffer on the less prevalent
ones, and (2) they allow errors to cascade between the components. These two limitations can also
compound each other. As a result, the information extracted by NLP systems can be incomplete and coarse-
grained, unable to support clinical applications that require a more fine-grained picture of the patient condition.
In this project, we propose to address these limitations on a clinical information extraction task that aims to
capture a more complete picture of the patient condition with a novel, fine-grained, hierarchical schema for
clinically-salient events and their relations. We define clinically-salient events as medical problems,
treatments, and tests that are documented during patient care. We capture each event in a frame that consists
of a trigger and a set of fine-grained attributes. We build event–event relations on top of events. To address
data imbalance, we propose (i) a novel active learning framework that guides manual annotation efforts
towards diverse and informative samples that can boost automated recognition of less prevalent attributes and
relations. To address cascading errors, we propose (ii) a novel joint learning system that enables multiple tasks
to inform each other for better performance across all tasks. We evaluate our work on multiple note types
from multiple institutions. Expected outcomes include (1) a comprehensive heterogeneous gold-standard
dataset created from multiple institutions for clinically-salient events and relations, (2) NLP methods that
generate state-of-the-art results in extraction of events and relations, and (3) publications that document our
findings. The annotation guidelines ...

## Key facts

- **NIH application ID:** 10507223
- **Project number:** 2R15LM013209-02A1
- **Recipient organization:** GEORGE MASON UNIVERSITY
- **Principal Investigator:** Ozlem Uzuner
- **Activity code:** R15 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $424,875
- **Award type:** 2
- **Project period:** 2022-08-10 → 2025-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10507223

## Citation

> US National Institutes of Health, RePORTER application 10507223, Joint learning methods for event and relation extraction from clinical narratives (2R15LM013209-02A1). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10507223. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
