# Multi-modal data integration to identify kinase substrates

> **NIH NIH U01** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2022 · $498,513

## Abstract

PROJECT SUMMARY
Kinases are involved in a variety of physiological functions, such as signal transduction, transcription,
development, and cell cycle regulation. Thus, dysregulation of protein kinases is associated with a range of
diseases, including cancer, metabolic diseases, and central nervous system disorders. More than 60 drugs
targeting kinases have been approved by the FDA, making them one of the most druggable protein families.
Despite their biomedical importance, a large group of human protein kinases remains highly understudied. These
proteins, often referred to as “dark kinases”, including by the Illuminating the Druggable Genome (IDG), have
limited knowledge of their substrate(s), which ultimately determine their cellular function. To address this
challenge, we will develop a novel computational framework to predict kinase-substrate interactions by
combining biologically relevant multi-modal data sources with cutting-edge machine learning methodologies.
Specifically, we will first derive features that quantify potential interactions between kinases and substrates from
diverse data sources, such as protein structure and dynamics, gene expression profiles, protein-protein and
protein-small molecule interaction networks, and evolutionary information (Aim 1). We will then develop
predictors of kinase-substrate interactions using an powerful machine learning methodology named Ensemble
Integration (EI; Aim 2). EI is based on the concept of heterogeneous ensembles that can aggregate an
unrestricted number and variety of base predictors derived from the above diverse data sources, and can benefit
from both the consensus and the diversity among these predictors. Due to its flexibility, EI is able to produce
more accurate predictions from multi-modal datasets than other established data integration methodologies, as
is expected for our project as well. Finally, we will evaluate the kinase-substrate interactions predicted by the EI-
based predictive model developed in Aim 2 using both computational and experimental methods (Aim 3). We
will also share the experimentally validated interactions, the most confident predictions from the EI model, and
all the data and software generated during this project through our KinaMetrix web server, as well as other public
data and software repositories. At its culmination, this project will produce novel and validated computational
methods and software to predict substrates of kinases, validated and high-confidence kinase-substrate
interactions for IDG dark kinases, and a public web server (KinaMetrix) to share these products. We expect that
these products will be highly useful for the study of dark kinases, especially in the IDG effort, as well as to better
understand kinase function and improve their utilization in drug development efforts. Our approach is also
expected to be generally applicable to other druggable protein families, such as ion channels and GPCRs.

## Key facts

- **NIH application ID:** 10451941
- **Project number:** 1U01CA271318-01
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** Gaurav Pandey
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $498,513
- **Award type:** 1
- **Project period:** 2022-07-05 → 2024-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10451941

## Citation

> US National Institutes of Health, RePORTER application 10451941, Multi-modal data integration to identify kinase substrates (1U01CA271318-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10451941. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
