PROJECT SUMMARY The last twenty years have experienced extensive growth in the sequencing of cancer genomes, leading to a dramatically increased understanding of the role of genetic and epigenetic mutations in cancer. This has largely been enabled by developments in high-throughput “second-generation” sequencing technology and analysis that characterize cancer genomes using short-reads. Recently, a new generation of high-throughput long-read sequencing instruments, primarily from Pacific Biosciences and Oxford Nanopore, have become available that are poised to displace short-read sequencing for many applications. We and others have used these technologies to discover tens of thousands of variants per cancer genome that are not detectable using short-reads, including structural variants and differentially methylated regions in known oncogenes and cancer risk genes. These technologies carry the potential to address many open questions in cancer biology, however, the analysis of long-read sequencing data is computationally demanding and needs specialized algorithms that are either too inefficient to use at scale or do not yet exist. In this proposal, we will address several gaps in the application of long-read technology for basic research and clinical use in cancer genomics. First, we will develop improved methods for finding structural variants and complex repeat expansions from long-reads, both of which are major diagnostic and prognostic indicators of disease, yet are not accurately identified using existing methods. Leveraging the improved phasing capabilities of long reads, this work will include the detection of mosaic variants, revealing tumor heterogeneity and variants in precancerous tissues. Next, we will apply machine learning and systems level advances to accelerate and improve the comparison of variants across large patient cohorts. Critically, this will compensate for the error prone nature of single molecule long-read sequencing to make these comparisons more accurate when comparing tumor-normal samples or pedigrees of related patients so that recurrent driving mutations can be accurately identified. Finally, we will develop integrative methods for the joint analysis of genome, transcriptome, and epigenetic profiling of cancer genomes. These advances will improve the identification of fusion genes, and allow for entirely new forms of epigenetic analysis, such as the allele-specific analysis of methylation across transposable elements and other repetitive elements. Synthesizing the many thousands of novel variants we will detect using our methods, we will then develop algorithms that will identify and evaluate recurrent genetic or epigenetic variations as putative driving mutations. All methods will be released open-source and will empower us, our ITCR collaborators, and the cancer genomics community at large to study genetic and epigenetic variants with near perfect accuracy and thereby unlock many new associations to treatment and d...