# Deep phenotyping in Electronic Health Records for Genomic Medicine

> **NIH NIH R01** · COLUMBIA UNIVERSITY HEALTH SCIENCES · 2020 · $799,965

## Abstract

PROJECT SUMMARY
The overarching goal of the project is to establish a genomic medicine learning system to accelerate genomic
knowledge discovery and application in electronic health records (EHRs). We will integrate deep characteristic
phenotypes extracted from EHRs and evolving knowledge of genotype-phenotype associations to optimize the
accuracy of variant interpretation and the cost-effectiveness of clinical genome/exome sequencing, and to
accelerate the discovery of causal genes by constructing a dynamic genotype-phenotype knowledge network.
Prior knowledge on phenotype-gene relationships and phenotypic information about patients can facilitate the
identification of disease-causing mutations from thousands of genetic variants in the context of clinical genomic
sequencing; however, how best to abstract phenotype information from notes in the EHRs of patients who are
diagnosed with or evaluated for monogenetic disorders, standardize the computable representation of
phenotypes, and utilize it in genomic interpretation remains unclear. Additionally, how to systematically compare
phenotypes across diseases to discover new knowledge in human genetics remains a largely untapped area
with great promise. To address these challenges, we will develop and validate scalable and portable open-source
natural language processing (NLP) methods for automated and accurate abstraction of characteristic phenotype
concepts (e.g., “j-shaped sella turcica” and “short stature”) from EHR narratives. We will then develop a
phenotype-driven scoring system called EHR-Phenolyzer to predict the likely candidate genetic variants
associated with the phenotypes for patients with genomic sequencing and a high probability of a monogenic
condition. On this basis, we will develop a probabilistic disease diagnosis and knowledge discovery system using
rich and deep EHR phenotypes, and evaluate these methods for genomic diagnosis and discovery using large-
scale clinical exome sequencing data. Ultimately, these methods will support efficient, effective, and scalable
genomic diagnostics, and facilitate the implementation of genome-guided precision medicine in clinical practice.

## Key facts

- **NIH application ID:** 9925808
- **Project number:** 5R01LM012895-03
- **Recipient organization:** COLUMBIA UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** CHUNHUA WENG
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $799,965
- **Award type:** 5
- **Project period:** 2018-09-17 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9925808

## Citation

> US National Institutes of Health, RePORTER application 9925808, Deep phenotyping in Electronic Health Records for Genomic Medicine (5R01LM012895-03). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/9925808. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
