# From enrichment to insights

> **NIH NIH R01** · STANFORD UNIVERSITY · 2020 · $643,304

## Abstract

Project Summary
Most medical decisions are made without the support of rigorous evidence in large part due to the cost
and complexity of performing randomized trials for most clinical situations. In practice, clinicians must
use their judgement, informed by their own and the collective experience of their colleagues. The
advent of the electronic health record (EHR) enables the modern practitioner to algorithmically check
the records of thousands or millions of patients to rapidly find similar cases and compare outcomes.
In addition to filling the inferential gap in actionable evidence, these kinds of analyses avoid issues of
ethics, practicality, and generalizability that plague randomized clinical trials (RCTs). Unfortunately,
identifying patients with the appropriate phenotypes, properly leveraging available data to adjust
results, and matching similar patients to reduce confounding remain critical challenges in every study
that uses EHR data. Overcoming these challenges to improve the accuracy of observational studies
conducted with EHR data is of paramount importance.
Studies using EHR data begin by defining a set of patients with specific phenotypes, analogous to
amassing a cohort for a clinical trial. This process of electronic phenotyping, is typically done via a set
of rules defined by experts. Machine learning approaches are increasingly used to complement
consensus definitions created by experts and we propose several advances to validate and improve this
practice. We will explore and quantify the effects of feature engineering choices to transform the
diagnoses, procedures, medications, laboratory tests and clinical notes in the EHR into a computable
feature matrix. Finally, building on recent advances, we plan to characterize the performance of
existing methods and develop EHR-specific strategies for patient matching.
Our work is significant because we will take on three challenging problems--electronic phenotyping,
feature engineering, and patient matching--that stand in the way of generating insights via EHR data. If
we are successful, we will significantly advance our ability to generate insights from the large amounts
of health data that are routinely generated as a byproduct of clinical processes.

## Key facts

- **NIH application ID:** 10000216
- **Project number:** 5R01LM011369-08
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** NIGAM H SHAH
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $643,304
- **Award type:** 5
- **Project period:** 2013-09-01 → 2022-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10000216

## Citation

> US National Institutes of Health, RePORTER application 10000216, From enrichment to insights (5R01LM011369-08). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10000216. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*