# Fair Phenotype Annotation and Genomic Reinterpretation

> **NIH NIH R01** · COLUMBIA UNIVERSITY HEALTH SCIENCES · 2024 · $880,069

## Abstract

PROJECT SUMMARY
Given the rapid evolution of genomic knowledge, the need for genomic reinterpretation has been increasing.
However, there is no standard approach yet to identifying to whom, when, and how reinterpretation should be
provided to ensure accuracy, cost-effectiveness and fairness. Access to genomic tests and genetic specialists
has widened health disparities, which could be further exacerbated by limited ancestry-specific genetic data. Our
overarching goal is to design a scalable and sustainable informatics framework to support continuous genomic
reanalysis for symptomatic patients with non-diagnostic exome or genome sequencing in diverse populations.
Extending our prior published work on Doc2HPO, Criteria2Query, Phen2Gene, PhenCards, Phenominal, and
phenotype-disease knowledge graphs, we will first develop a natural language processing (NLP) pipeline to
create a multimodal phenome from clinical notes using the latest Phenopacket schema. By comparing changes
in longitudinal EHR phenotypes over time and analyzing the changes in the context of the new evidence for
variants, we will identify individuals who can benefit most from genomic reanalysis. Then we will incorporate
evolving clinical phenotypes extracted from longitudinal electronic health record (EHR) data to trigger automatic
variant reinterpretation using an ancestry-aware and age-sensitive knowledge graph (PhenoKG). Unlike typical
phenotype-based gene prioritization tools such as Phen2Gene, here we will build the knowledge graph by
extending our previous efforts and extracting phenotype-genotype relations from the EHR as well as the
literature. This knowledge graph will enable the query, extraction and inference of ancestry-aware, as well as
age-sensitive, phenotype-genotype relationships. By leveraging a multi-layer random-walk integrative network
approach, we will incorporate this heterogeneous knowledge graph into a phenotype-driven gene and variant
prioritization algorithm for continuous genomic reanalysis across diverse populations. With these methodological
developments, we will implement a routine reanalysis informatics pipeline at two academic institutions, Columbia
University Irving Medical Center (CUIMC) and Children’s Hospital of Philadelphia (CHOP). We will evaluate the
improvements in diagnostic yield across a diverse set of clinical exome/genome sequencing data over a 3-year
period. We will evaluate how our approach to fair phenotyping and continuous variant reinterpretation can reduce
genomic health disparities for underserved and underrepresented populations. Ultimately, these methods will
enable informatics-driven, efficient, scalable, continuous and fair genomic diagnostics for genomic medicine via
continuous genomic variant reinterpretation.

## Key facts

- **NIH application ID:** 10873870
- **Project number:** 5R01HG013031-02
- **Recipient organization:** COLUMBIA UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** Wendy K Chung
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $880,069
- **Award type:** 5
- **Project period:** 2023-07-01 → 2028-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10873870

## Citation

> US National Institutes of Health, RePORTER application 10873870, Fair Phenotype Annotation and Genomic Reinterpretation (5R01HG013031-02). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10873870. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
