# AE2Vec: Medical concept embedding and time-series analysis for automated adverse event detection

> **NIH NIH F31** · NORTHWESTERN UNIVERSITY · 2024 · $47,912

## Abstract

7. Project Summary/Abstract
Adverse events pose a significant challenge to medical interventions (drugs, devices, others) with an estimated
2.3 million cases of adverse drug events between 1969-2002. Adverse events are responsible for longer hospital
stay, higher healthcare costs, and higher mortality. There is a clear need for adverse event surveillance, but the
standards of manual chart review and voluntary reporting are time-consuming and unsustainable. Voluntary
reporting also misses most adverse event cases. The widespread adoption of electronic health records (EHRs)
captures medical data for the majority of US patients and presents an opportunity for sustainable adverse event
surveillance via automated strategies. However, there are two barriers to automating adverse event surveillance.
First, adverse events are poorly represented by International Classification of Disease (ICD) diagnosis
codes. This has inhibited efforts to use simple rules-based code or flag/trigger approaches, while complex and
high-performing text-mining approaches are thwarted by the difficulty of adapting them to other healthcare sites
and large data networks for wider surveillance. Second, temporal information in the EHR inherent to adverse
event timing and sequencing is challenging to capture. The challenges to existing approaches include –
treatment of related medical concepts as independent entities, the rapid explosion of data inhibiting scaling to
large numbers of medical concepts, and human interpretability. Our overarching goal is to expand on existing
biomedical informatics tools to better capture adverse events and more comprehensively represent the
full patient medical trajectory to identify archetypes of adverse event development. We will pilot these
methods for cancer patients undergoing immune checkpoint inhibitor (ICI) therapy. In Specific Aim 1, we will
incorporate medical concept embedding and clustering methods to draw a “map” of disease, segmented into
“neighborhoods” labeled for the conditions they describe, including adverse events. In Specific Aim 2, we will
test a novel method for tracking patient trajectories on a map of disease and hypothesize that we can identify
archetypal patient trajectories that have different clinical outcomes using time-series clustering. This work
addresses gaps in EHR-based phenotyping and adverse event surveillance. It has the potential to inform
risk factor identification, prediction of adverse event development, and prognostication of patient
outcomes, as well as lay a crucial stepping-stone for further progression of EHR-based phenotyping in
biomedical informatics. This fellowship award will enable me to develop my skills in biomedical informatics
methods, integrate clinical perspective into my research, hone my writing and presentation skills, and expand
my professional network. At the conclusion of this award, I will have made strides towards becoming an
independent physician-informaticist, fusing clinical exper...

## Key facts

- **NIH application ID:** 10913344
- **Project number:** 5F31LM014201-02
- **Recipient organization:** NORTHWESTERN UNIVERSITY
- **Principal Investigator:** Steven Duc Tran
- **Activity code:** F31 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $47,912
- **Award type:** 5
- **Project period:** 2023-09-01 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10913344

## Citation

> US National Institutes of Health, RePORTER application 10913344, AE2Vec: Medical concept embedding and time-series analysis for automated adverse event detection (5F31LM014201-02). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10913344. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
