# Uncovering therapeutic-associated biomarkers via machine learning and feature engineering approaches

> **NIH NIH R03** · MAYO CLINIC ROCHESTER · 2022 · $318,000

## Abstract

PROJECT SUMMARY
Identifying biomarkers that are diagnostic, robust and generalizable across individuals while possessing
therapeutic values is the most wanted endeavor in medicine. However, there are numerous challenges in the
identification of such robust therapy-associated biomarkers (TABs). For example, most of the current methods
seek to achieve statistically significant differential biological signals in general patient cohorts but fail to
acknowledge heterogenous genetic backgrounds and phenotypic diversity among individual patients. Our recent
studies using newly developed machine learning-based feature engineering approaches and conducted in a
pan-cancer study across 12 cancer types showed that biologically constrained features (named herein
invariant features) are universal in disease and can be used to classify individual cancers. Importantly, we also
show that invariant features can be used to build de novo biological networks and discover network hubs that
can be successfully utilized to infer the expression of associated genes. As such, invariant features can act as
information encoders. Using information from Drug Repurposing Hub we show that these hub genes are also
drug targets. Collectively, these observations suggest that invariant feature hubs can be TAB candidates. We
propose that under the new light of biological constraints, we can use a dynamic approach for biomarker
discovery that encapsulates both the genetic heterogeneity and molecular fluctuation across individual patients.
Our central hypothesis is that disease states show constrains in their molecular activities, and identifiable
invariable features possess diagnostic and therapeutic values. The main objective of this proposal is to uncover
TABs using selected NIH Common Fund datasets (namely, exRNA, GTEx, LINC, and IDG). In Aim 1, we will
test the hypothesis that biologically constrained invariant features are universal to most if not all biological states.
We will show this by finding invariant features with respect to each biological state from selected Common Fund
datasets. We will conduct comparative analyses in disease and normal states in order to dissect disease-specific
invariant features. Next, in Aim 2, we will test the hypothesis that invariant feature hubs are TABs. We will show
this by determining the diagnostic capability of invariant feature hubs for their “encodability” to reconstruct the
expression values of their associated invariant feature genes in different individual patients diagnosed under
same disease type. Finally, we will map these invariant feature hubs to IDG and DrugBank to determine their
druggability. For those understudied hubs with no known drugs, we will perform computational analyses such as
homology modeling and machine learning to characterize their druggability. We expect timely accomplishment
of proposed aims and successful completion of this project will no doubt provide added values for the selected
Common Fund datasets, whil...

## Key facts

- **NIH application ID:** 10564098
- **Project number:** 1R03OD034496-01
- **Recipient organization:** MAYO CLINIC ROCHESTER
- **Principal Investigator:** Hu Li
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $318,000
- **Award type:** 1
- **Project period:** 2022-09-20 → 2024-09-19

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10564098

## Citation

> US National Institutes of Health, RePORTER application 10564098, Uncovering therapeutic-associated biomarkers via machine learning and feature engineering approaches (1R03OD034496-01). Retrieved via AI Analytics 2026-06-25 from https://api.ai-analytics.org/grant/nih/10564098. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
