Hardwiring Mechanism into Predicting Cancer Phenotypes by Computational Learning

NIH RePORTER · NIH · R01 · $234,852 · view on reporter.nih.gov ↗

Abstract

DESCRIPTION (provided by applicant): Few biomarkers derived from genome scale data have translated into improved clinical classification of cancer subtypes, in spite of the wealth of available genome-wide studies and of the corresponding application of numerous statistical algorithms. This widespread shortcoming derives from the pervasive use of "off the shelf" algorithms and machine learning techniques developed for image classification and language processing, which are naïve of the underlying biology of the system. Furthermore, for genome-wide data, the number of samples is often small relative to the number of potential candidate biomarkers, resulting in variable accuracy on independent test data despite high accuracy in the samples used for discovery, which contributes to the failure of clinical biomarkers. This problem - so called "curse of dimensionality" - is further exacerbated by the prohibitive cost of dramatically increasing sample size and by patient stratification into smaller subgroups for personalized and precision medicine. Disease phenotypes arise from distinct and specific perturbations in selected networks and pathways defined by the interactions of their molecular constituents. In cancer, these perturbations may reside in gene regulatory networks topology and state, in cell signaling activity, or in metabolic conditions. We hypothesize that by leveragin such prior biological information on cancer biology we will be able to reduce model complexity and build mechanistically justified predictive models. To pursue this hypothesis, we will develop an analytical framework to embed mechanistic constraints derived from network biology into the statistical learning process itself. Hence, this application will develop a novel suite of statistial learning algorithms that embed (Aim 1) gene expression regulatory networks, (Aim 2) cell signaling activity, and (Aim 3) metabolism to classify breast and prostate cancer. Throughout the study we will work closely with clinical collaborators to ensure that our method improve over and above current predictive and prognostic models. Finally, since in our study we will also generate mechanistic classifiers based on gene expression measurements obtained from clinical assays that are already commercially available (i.e., MammaPrint®, and Decipher®), our innovative models and predictors will be also readily available for clinical translation. Our mechanism-driven classifiers will simultaneously have greater accuracy and interpretability than classifiers developed without regard for the underlying biology of the disease. Furthermore, embedding biological mechanisms in the classifiers will also facilitate the identification of alternative therapeutic targets specific to each cancer subtype, potentially improving patient prognosis and health outcomes. Finally, the substantial curation of molecular pathways and biological networks we will carry on in the project will also provide a powerful resource for ...

Key facts

NIH application ID: 10328651
Project number: 7R01CA200859-06
Recipient: WEILL MEDICAL COLL OF CORNELL UNIV
Principal Investigator: Luigi Marchionni
Activity code: R01
Funding institute: NIH
Fiscal year: 2020
Award amount: $234,852
Award type: 7
Project period: 2016-04-05 → 2023-03-31