Deep phenotyping in Electronic Health Records for Genomic Medicine

NIH RePORTER · NIH · R01 · $799,965 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY The overarching goal of the project is to establish a genomic medicine learning system to accelerate genomic knowledge discovery and application in electronic health records (EHRs). We will integrate deep characteristic phenotypes extracted from EHRs and evolving knowledge of genotype-phenotype associations to optimize the accuracy of variant interpretation and the cost-effectiveness of clinical genome/exome sequencing, and to accelerate the discovery of causal genes by constructing a dynamic genotype-phenotype knowledge network. Prior knowledge on phenotype-gene relationships and phenotypic information about patients can facilitate the identification of disease-causing mutations from thousands of genetic variants in the context of clinical genomic sequencing; however, how best to abstract phenotype information from notes in the EHRs of patients who are diagnosed with or evaluated for monogenetic disorders, standardize the computable representation of phenotypes, and utilize it in genomic interpretation remains unclear. Additionally, how to systematically compare phenotypes across diseases to discover new knowledge in human genetics remains a largely untapped area with great promise. To address these challenges, we will develop and validate scalable and portable open-source natural language processing (NLP) methods for automated and accurate abstraction of characteristic phenotype concepts (e.g., “j-shaped sella turcica” and “short stature”) from EHR narratives. We will then develop a phenotype-driven scoring system called EHR-Phenolyzer to predict the likely candidate genetic variants associated with the phenotypes for patients with genomic sequencing and a high probability of a monogenic condition. On this basis, we will develop a probabilistic disease diagnosis and knowledge discovery system using rich and deep EHR phenotypes, and evaluate these methods for genomic diagnosis and discovery using large- scale clinical exome sequencing data. Ultimately, these methods will support efficient, effective, and scalable genomic diagnostics, and facilitate the implementation of genome-guided precision medicine in clinical practice.

Key facts

NIH application ID: 10164857
Project number: 5R01LM012895-04
Recipient: COLUMBIA UNIVERSITY HEALTH SCIENCES
Principal Investigator: CHUNHUA WENG
Activity code: R01
Funding institute: NIH
Fiscal year: 2021
Award amount: $799,965
Award type: 5
Project period: 2018-09-17 → 2024-05-31