Project Summary/Abstract Despite substantial advances in the treatment and diagnosis of cancer over the past several decades, it remains the second leading cause of death worldwide. Many current treatments have aimed at targeting defects in DNA damage response (DDR) in cancer cells, including immunotherapy in mismatch repair deficient cancers and PARP inhibitors in the context of homologous recombination deficiency. However, we are currently unable to reliably determine which DDR defects are present in a given cancer sample, which severely limits our ability to exploit these therapeutic vulnerabilities. Structural variants (SVs), or genomic rearrangements formed as a product of aberrant double strand break repair, hold promise as biomarkers of DDR state. Indeed, SVs affect a larger proportion of the cancer genome than any other form of genetic alteration and have features that reflect their mechanism of formation. We hypothesize that novel computational methods tailored to the complexity of SV features, as well as associations with newly discovered SV features, will enable accurate assessment of variation in DDR across cancers. In Aim 1, we will follow up on our laboratory’s recent discovery of significant discontinuous homology, or microhomeology (MHe), in cancer SVs. First, we will engineer defined DDR defects in isogenic cell lines and assess their effects on MHe. Then we will validate our in vitro data by assessing MHe patterns in human tumors that have orthogonal evidence for DDR pathway defects. In Aim 2, we introduce Quant-HDP, a Bayesian non-parametric generative algorithm based on the Hierarchical Dirichlet Process, which leverages complex modeling of SV features to detect signatures (patterns of SVs) associated with specific biological processes. We will apply this method to both publicly available cancer cohorts and internally sequenced pre- and post-chemotherapy gliomas and endometrial cancers. We will then validate putative signature associations in isogenic cell lines. In sum, this proposal makes use of untapped genomic features and sophisticated computational models to distinguish between different DDR states and exposures, which has direct clinical implications.