# Learning Precision Medicine for Rare Diseases Empowered by Knowledge-driven Data Mining

> **NIH NIH R01** · UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON · 2024 · $696,216

## Abstract

PROJECT SUMMARY/ABSTRACT
Despite their individual rarity, rare diseases collectively affect one in eleven Americans. Rare disease patients
often face significant diagnostic delays, waiting an average of 6 years from the onset of symptoms to an
accurate diagnosis. Recent advances in precision medicine have accelerated research in rare diseases,
overwhelming clinicians’ capacities to manage and leverage the latest knowledge efficiently in clinical practice.
For example, novel gene mutations related to idiopathic pulmonary fibrosis (IPF) frequently do not appear in
the Human Gene Mutation Database (HGMD) or other knowledge bases and are only present in initial articles.
Additionally, due to the lack of clinical evidence and empirical knowledge, awareness of rare diseases remains
low among healthcare providers and is a major reason for diagnostic odysseys experienced by many patients,
in practice.
Teaming up Mayo Clinic Program for Rare and Undiagnosed Diseases (PRaUD) with the partnership of
Vanderbilt University Medical Center (VUMC), we aim to address the translation gap by building a novel end-
to-end informatics framework to accelerate the diagnosis of rare diseases. We plan to achieve the development
of the proposed framework through three specific aims. Aim 1 is to construct RDAccelerate, a computable rare
disease knowledge hub that accumulates and maintains up-to-date knowledge for rare diseases. It is costly to
stay current with the literature and informed with clinical evidence and empirical experience. To address this,
we will leverage the latest natural language processing (NLP) techniques such as pre-trained language models
(PLMs) and data mining techniques such as graph neural network (GNN) embeddings to accelerate the
extraction, integration, and mining of associations from a diverse range of resources. Aim 2 focuses on the
provision of RDRecommend, a deep phenotype-driven system for rare disease differential diagnoses trained
with the up-to-date knowledge in RDAccelerate and longitudinal patient records of rare disease cohorts. It often
takes substantial time and effort for an accurate diagnosis due to the rarity. We therefore propose to apply
various recommendation techniques to suggest rare disease differential diagnoses. We will then develop
RDConnect, a web portal to search information, display differential diagnostic recommendations, and collect
clinical evidence automatically for further validation in Aim 3. The proposed informatics framework will be
evaluated through several practice projects at PRaUD in collaboration with clinical co-Investigators. The
framework will be developed through team science collaboration using two rare diseases (IPF and
mastocytosis). We will then validate the framework in supporting two other rare diseases (hypereosinophilic
syndrome [HES] and rare kidney stone) before scaling up to a broad spectrum of rare diseases. The external
generalizability of the solution will be tested through our subsite pa...

## Key facts

- **NIH application ID:** 10922807
- **Project number:** 5R01HG012748-02
- **Recipient organization:** UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
- **Principal Investigator:** HONGFANG LIU
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $696,216
- **Award type:** 5
- **Project period:** 2023-09-06 → 2027-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10922807

## Citation

> US National Institutes of Health, RePORTER application 10922807, Learning Precision Medicine for Rare Diseases Empowered by Knowledge-driven Data Mining (5R01HG012748-02). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10922807. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
