PROJECT SUMMARY Kinases are involved in a variety of physiological functions, such as signal transduction, transcription, development, and cell cycle regulation. Thus, dysregulation of protein kinases is associated with a range of diseases, including cancer, metabolic diseases, and central nervous system disorders. More than 60 drugs targeting kinases have been approved by the FDA, making them one of the most druggable protein families. Despite their biomedical importance, a large group of human protein kinases remains highly understudied. These proteins, often referred to as “dark kinases”, including by the Illuminating the Druggable Genome (IDG), have limited knowledge of their substrate(s), which ultimately determine their cellular function. To address this challenge, we will develop a novel computational framework to predict kinase-substrate interactions by combining biologically relevant multi-modal data sources with cutting-edge machine learning methodologies. Specifically, we will first derive features that quantify potential interactions between kinases and substrates from diverse data sources, such as protein structure and dynamics, gene expression profiles, protein-protein and protein-small molecule interaction networks, and evolutionary information (Aim 1). We will then develop predictors of kinase-substrate interactions using an powerful machine learning methodology named Ensemble Integration (EI; Aim 2). EI is based on the concept of heterogeneous ensembles that can aggregate an unrestricted number and variety of base predictors derived from the above diverse data sources, and can benefit from both the consensus and the diversity among these predictors. Due to its flexibility, EI is able to produce more accurate predictions from multi-modal datasets than other established data integration methodologies, as is expected for our project as well. Finally, we will evaluate the kinase-substrate interactions predicted by the EI- based predictive model developed in Aim 2 using both computational and experimental methods (Aim 3). We will also share the experimentally validated interactions, the most confident predictions from the EI model, and all the data and software generated during this project through our KinaMetrix web server, as well as other public data and software repositories. At its culmination, this project will produce novel and validated computational methods and software to predict substrates of kinases, validated and high-confidence kinase-substrate interactions for IDG dark kinases, and a public web server (KinaMetrix) to share these products. We expect that these products will be highly useful for the study of dark kinases, especially in the IDG effort, as well as to better understand kinase function and improve their utilization in drug development efforts. Our approach is also expected to be generally applicable to other druggable protein families, such as ion channels and GPCRs.