# Integration of Biomedical Ontologies with Deep Learning AI for Research and Diagnosis of Rare Diseases

> **NIH NIH P20** · CLEMSON UNIVERSITY · 2024 · $240,580

## Abstract

Rare diseases collectively impact more than 30 million individuals in the United States and 300-400 million 
individuals worldwide. There are an estimated 7-10 thousand known rare diseases, of which approximately 
80% have a genetic etiology. Although important, genetic sequence data alone is insufficient to determine 
mechanism and diagnosis of rare genetic disorders (RGDs). In-silico variant pathogenicity prediction and 
phenomic information are also critical, though even with these, diagnostic rates remain frustratingly low. Novel 
research paradigms, such as a gene-to-patient approach that samples individuals with high confidence in-silico 
predicted pathogenic variants from large databases and asks if they share a common phenotype could 
promote novel discovery and improve diagnostic rates; however, this approach is hampered by the inability to 
easily extract phenotypic information from unstructured data and reliably identify a shared phenotype among 
individuals. RGD research that combines genomic and phenomic with other ‘omics’ data also has potential for 
improved diagnostic yield and mechanistic understanding; however, these endeavors face significant 
obstacles, notably a dearth of multiomic data fusion and analysis methods. Practical RGD clinical diagnosis 
additionally requires patient specific diagnostic pathways, the lack of which has resulted in unacceptably 
complex and lengthy diagnostic odysseys that places significant burden on individuals suffering with RGDs. 
Our proposal addresses these critical limitations through novel artificial intelligence (AI) method development 
that will integrate information rich biological ontologies with multiomic data. We will first extend graph neural 
network node representation learning methods and develop a custom genetic search algorithm to enable 
discovery of a shared population phenotype among individuals in the absence of a disease specification, thus 
enabling a gene-to-patient research paradigm. We will further apply these methods to integrate node 
representations for a tissue-to-gene expression knowledge graph with genetic sequence data and clinically 
accessible tissue (CAT) transcriptomic results in a transformer based deep learning model to predict tissuespecific aberrant splicing pathogenicity. Finally, we will combine these methods in a pilot clinical decision 
support system to recommend personalized genetic testing and clinical tests to support RGD diagnosis. This 
pilot system will leverage large language models anchored to biological ontologies to enable clinicians and 
patients to pose questions regarding reasoning, benefits, and risks of recommended clinical tests in an 
efficient, flexible, conversational form. In combination, these outcomes will dramatically improve researchers’ 
ability to utilize multiomic data to elucidate the mechanisms by which variants affect phenotype and guide 
clinicians in the diagnosis and care of individuals with RGDs, thereby substantially ...

## Key facts

- **NIH application ID:** 11013492
- **Project number:** 5P20GM139769-04
- **Recipient organization:** CLEMSON UNIVERSITY
- **Principal Investigator:** Robert R. H Anholt
- **Activity code:** P20 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $240,580
- **Award type:** 5
- **Project period:** 2024-02-01 → 2026-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/11013492

## Citation

> US National Institutes of Health, RePORTER application 11013492, Integration of Biomedical Ontologies with Deep Learning AI for Research and Diagnosis of Rare Diseases (5P20GM139769-04). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/11013492. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
