# Statistical methods for cancer progression delineation and subtype identification

> **NIH NIH R03** · UNIVERSITY OF KENTUCKY · 2021 · $74,902

## Abstract

Project Summary
 Carcinogenesis is a complex process involving somatic mutations in a number of key biological pathways
and processes. Full study of the temporal order of somatic mutation occurrences is very important to
understand biological mechanisms of cancer development and to inform new therapeutic targets and treatment
options. The first and most recognized example of order of mutations is from colon cancer, which is frequently
initiated by mutations that affect the Wnt signaling pathway, and then progress upon subsequent mutations in
genes involved in MAPK, PI3K, TGF-beta, and p53 signaling pathways. However, for many other cancer types,
temporal orders of mutations are still largely unknown. Somatic mutation profiling via high throughput DNA
sequencing has provided an unprecedented opportunity for using statistical/computational methods to study
cancer progression. We and others have developed methods to infer temporal order of somatic mutations
based on combining mutation profile data from a cohort of patients. However, one major limitation of current
methods is that they only consider presence or absence of mutations in a patient’s tumor, but do not take into
account intra-tumoral heterogeneity (ITH). The ITH refers to the presence of multiple cell populations, i.e.
subclones, with distinct mutation profiles within a patient’s tumor. The ITH, which can be inferred from either
single-/multi-region bulk sequencing or single cell sequencing, is usually characterized by a phylogenetic tree
with nodes in the tree indicating different subclones and edges indicating the evolutionary relationships of
subclones. As a phylogenetic tree describes the temporal order of mutations within an individual patient’s
tumor, incorporating such in-depth intra-patient information into the tumor progression analysis across patients
is likely to substantially increase the power and accuracy of the analysis. Another important priority in cancer
research is to identify molecular subtypes. As cancer is a complex disease, patients of the same cancer type
may have very different prognoses and responses to therapy. Further classifying patients into subtypes allows
clinicians to better predict a patient’s clinical outcomes and design more personalized treatment strategies. By
harnessing omics profiling data, statistical/machine learning has emerged as a powerful tool to identify
molecular cancer subtypes. However, due to the high complexity of cancer omics data and limited sample size,
it is still challenging to obtain stable and biologically interpretable results. Recently, it has been advocated that
incorporating biological knowledge and structure into the construction of statistical/machine learning models is
a viable approach to improve the mechanistic interpretability and robustness of the models. To advance
current capabilities, we propose to develop new statistical methods to better estimate the temporal order of
pathway mutations by integrating ITH, pathway a...

## Key facts

- **NIH application ID:** 10201322
- **Project number:** 1R03CA259670-01
- **Recipient organization:** UNIVERSITY OF KENTUCKY
- **Principal Investigator:** Chi Wang
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $74,902
- **Award type:** 1
- **Project period:** 2021-07-01 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10201322

## Citation

> US National Institutes of Health, RePORTER application 10201322, Statistical methods for cancer progression delineation and subtype identification (1R03CA259670-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10201322. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*