# Illuminating the Druggable Genome by Knowledge Graphs

> **NIH NIH U01** · JACKSON LABORATORY · 2020 · $536,615

## Abstract

PROJECT SUMMARY / ABSTRACT
About 1500 of the ~20,000 protein-coding genes of the human genome can bind drug-like molecules, and yet
only about 600 are currently targeted by FDA-approved drugs. Therefore, at least 930 proteins are potential drug
targets that are not yet being utilized for human medicine and, given our incomplete state of knowledge about
the human genome, the actual number could be much higher. There is therefore a substantial unmet need to
improve our understanding of this so-called genomic dark matter in order to develop novel classes of drugs to
improve treatment of disease. Comprehensive experimental investigation of these proteins in the context of
hundreds of thousands of compounds and thousands of diseases would be prohibitively expensive, but
computational approaches could significantly refine the list. In this project we will apply two sophisticated
computational approaches to the task of predicting the most promising novel drug targets. We will integrate the
knowledge bases DrugCentral and other resources with the disease and phenotype knowledge base of the
Monarch Initiative into a semantically harmonized knowledge graph (KG). This will result in a KG with
comprehensive coverage of diseases, genes, gene functions, phenotypic abnormalities, drugs, drug
mechanisms, and drug targets. Machine learning (ML) identifies patterns from training sets and applies the
patterns to predict entities and relations in new data. ML using KGs has become a hot new research area in
computer science, but remains difficult to use for real-world applications, owing to the lack of adequate software
packages. We will therefore implement state-of-the art learning algorithms based on deep learning on KGs by
extending and adapting selected algorithms to the task of drug and drug target discovery. We will develop an
easy-to-use software library and demonstrate its use by means of notebooks that will be designed to serve as
starting points for future computational research by other scientists, since they will contain the analysis workflow
along with documentation about each step. The human genome codes more than 500 protein kinases, which
are enzymes that add a phosphate group to specific amino acid residues and thereby transmit a biological signal.
There are currently 35 FDA approved protein kinase modulators acting on 38 protein kinases, which are thus
one of the most important groups of druggable proteins encoded by our genome. We will perform a detailed
computational study of this group and experimentally validate our top, novel candidate using a patient-derived
xenograft model system.

## Key facts

- **NIH application ID:** 9878081
- **Project number:** 5U01CA239108-02
- **Recipient organization:** JACKSON LABORATORY
- **Principal Investigator:** CHRISTOPHER J MUNGALL
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $536,615
- **Award type:** 5
- **Project period:** 2019-03-01 → 2022-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9878081

## Citation

> US National Institutes of Health, RePORTER application 9878081, Illuminating the Druggable Genome by Knowledge Graphs (5U01CA239108-02). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/9878081. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*