Project Summary Carcinogenesis is a complex process involving somatic mutations in a number of key biological pathways and processes. Full study of the temporal order of somatic mutation occurrences is very important to understand biological mechanisms of cancer development and to inform new therapeutic targets and treatment options. The first and most recognized example of order of mutations is from colon cancer, which is frequently initiated by mutations that affect the Wnt signaling pathway, and then progress upon subsequent mutations in genes involved in MAPK, PI3K, TGF-beta, and p53 signaling pathways. However, for many other cancer types, temporal orders of mutations are still largely unknown. Somatic mutation profiling via high throughput DNA sequencing has provided an unprecedented opportunity for using statistical/computational methods to study cancer progression. We and others have developed methods to infer temporal order of somatic mutations based on combining mutation profile data from a cohort of patients. However, one major limitation of current methods is that they only consider presence or absence of mutations in a patient’s tumor, but do not take into account intra-tumoral heterogeneity (ITH). The ITH refers to the presence of multiple cell populations, i.e. subclones, with distinct mutation profiles within a patient’s tumor. The ITH, which can be inferred from either single-/multi-region bulk sequencing or single cell sequencing, is usually characterized by a phylogenetic tree with nodes in the tree indicating different subclones and edges indicating the evolutionary relationships of subclones. As a phylogenetic tree describes the temporal order of mutations within an individual patient’s tumor, incorporating such in-depth intra-patient information into the tumor progression analysis across patients is likely to substantially increase the power and accuracy of the analysis. Another important priority in cancer research is to identify molecular subtypes. As cancer is a complex disease, patients of the same cancer type may have very different prognoses and responses to therapy. Further classifying patients into subtypes allows clinicians to better predict a patient’s clinical outcomes and design more personalized treatment strategies. By harnessing omics profiling data, statistical/machine learning has emerged as a powerful tool to identify molecular cancer subtypes. However, due to the high complexity of cancer omics data and limited sample size, it is still challenging to obtain stable and biologically interpretable results. Recently, it has been advocated that incorporating biological knowledge and structure into the construction of statistical/machine learning models is a viable approach to improve the mechanistic interpretability and robustness of the models. To advance current capabilities, we propose to develop new statistical methods to better estimate the temporal order of pathway mutations by integrating ITH, pathway a...