# Protein Knowledge Networks and Semantic Computing for Disease Discovery

> **NIH NIH R35** · UNIVERSITY OF DELAWARE · 2023 · $433,379

## Abstract

Protein Knowledge Networks and Semantic Computing for Disease Discovery
The growing volume and breadth of information from the scientific literature and biomedical databases
pose challenges to the research community to exploit the content for discovery. This MIRA grant
application will advance our knowledge mining and semantic computing system to accelerate data-driven
discovery for understanding of gene-disease-drug relationships. We have employed natural language
processing and machine learning approaches in a generalizable framework for bioentity and relation
extraction from large-scale text. Our Protein Ontology supports protein-centric semantic integration of
biomedical data for both human understanding and computational reasoning. We have also developed a
resource to support functional interpretation and analysis of protein post-translational modifications
(PTMs) across modification types and organisms. Building on our computational algorithms,
bioinformatics infrastructure and community interactions, we will further develop literature mining tools to
support automated information extraction across the bibliome and open linked data models for semantic
integration of biomedical data from heterogeneous resources. Our text mining tools will be trained for
different use cases using deep learning methods. We will develop RDF (Resource Description
Framework) semantic models in an increasingly computable, inferable and explainable knowledge
system to assist in hypothesis generation. We will present evidence in the form of textual artifacts and
semantic models to ensure unbiased analysis and interpretation of results to promote rigorous and
reproducible research. We will develop scientific case studies to drive the system development.
Examples include PTM disease variant and enrichment analyses for drug target identification, genotype-
phenotype knowledge mining for Alzheimer's Disease understanding, and gene-disease-drug knowledge
network construction for COVID-19 drug repurposing. To foster community engagement, we will host
workshops and hackathons to address critical fundamental research questions and emerging disease
scenarios. We have fully adopted the FAIR (Findable, Accessible, Interoperable, Reusable) principles for
resource sharing. All data, tools and research results will be broadly disseminated from the project
website, accessible programmatically via RESTful API, queryable via SPARQL endpoints, and
dockerized for community code reuse. The successful completion of this research will thus support
scalable, integrative and collaborative knowledge discovery to accelerate disease understanding and
drug target discovery.

## Key facts

- **NIH application ID:** 10698082
- **Project number:** 5R35GM141873-03
- **Recipient organization:** UNIVERSITY OF DELAWARE
- **Principal Investigator:** CATHY H. WU
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $433,379
- **Award type:** 5
- **Project period:** 2021-08-25 → 2026-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10698082

## Citation

> US National Institutes of Health, RePORTER application 10698082, Protein Knowledge Networks and Semantic Computing for Disease Discovery (5R35GM141873-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10698082. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
