Uncovering therapeutic-associated biomarkers via machine learning and feature engineering approaches

NIH RePORTER · NIH · R03 · $318,000 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Identifying biomarkers that are diagnostic, robust and generalizable across individuals while possessing therapeutic values is the most wanted endeavor in medicine. However, there are numerous challenges in the identification of such robust therapy-associated biomarkers (TABs). For example, most of the current methods seek to achieve statistically significant differential biological signals in general patient cohorts but fail to acknowledge heterogenous genetic backgrounds and phenotypic diversity among individual patients. Our recent studies using newly developed machine learning-based feature engineering approaches and conducted in a pan-cancer study across 12 cancer types showed that biologically constrained features (named herein invariant features) are universal in disease and can be used to classify individual cancers. Importantly, we also show that invariant features can be used to build de novo biological networks and discover network hubs that can be successfully utilized to infer the expression of associated genes. As such, invariant features can act as information encoders. Using information from Drug Repurposing Hub we show that these hub genes are also drug targets. Collectively, these observations suggest that invariant feature hubs can be TAB candidates. We propose that under the new light of biological constraints, we can use a dynamic approach for biomarker discovery that encapsulates both the genetic heterogeneity and molecular fluctuation across individual patients. Our central hypothesis is that disease states show constrains in their molecular activities, and identifiable invariable features possess diagnostic and therapeutic values. The main objective of this proposal is to uncover TABs using selected NIH Common Fund datasets (namely, exRNA, GTEx, LINC, and IDG). In Aim 1, we will test the hypothesis that biologically constrained invariant features are universal to most if not all biological states. We will show this by finding invariant features with respect to each biological state from selected Common Fund datasets. We will conduct comparative analyses in disease and normal states in order to dissect disease-specific invariant features. Next, in Aim 2, we will test the hypothesis that invariant feature hubs are TABs. We will show this by determining the diagnostic capability of invariant feature hubs for their “encodability” to reconstruct the expression values of their associated invariant feature genes in different individual patients diagnosed under same disease type. Finally, we will map these invariant feature hubs to IDG and DrugBank to determine their druggability. For those understudied hubs with no known drugs, we will perform computational analyses such as homology modeling and machine learning to characterize their druggability. We expect timely accomplishment of proposed aims and successful completion of this project will no doubt provide added values for the selected Common Fund datasets, whil...

Key facts

NIH application ID
10564098
Project number
1R03OD034496-01
Recipient
MAYO CLINIC ROCHESTER
Principal Investigator
Hu Li
Activity code
R03
Funding institute
NIH
Fiscal year
2022
Award amount
$318,000
Award type
1
Project period
2022-09-20 → 2024-09-19