PROJECT SUMMARY Whole-genome sequencing has become a popular approach for comprehensive genome-wide characterization of genomic alterations, ranging from single nucleotide variants and indels to copy number changes and complex structural alterations. However, standard bulk sequencing provides information on the population average of the cells, and our understanding of genetic heterogeneity and clonal dynamics remains inadequate. In the proposed work, we aim to develop computational methods for analysis of single cell whole-genome sequencing data. Due to the allelic bias and artifacts associated with the DNA amplification step, accurate identification of genomic alterations is challenging. In Aim 1, we will develop methods to identify single nucleotide variants and indels, building on our experience in analysis of single neurons and utilizing the latest amplification techniques. In Aim 2, we will focus on methods to detect copy number variants, structural variants, and tandem repeat mutations. We will employ machine learning models including graph- and autoencoder-based deep learning approaches. In Aim 3, we will apply the methods devised in the first two aims to several important biological questions that can be best resolved by single cell DNA sequencing. These include identification of off-target effects and on-target efficiency of genome editing, lineage tracing in development using somatic mutations as endogenous barcodes, correlation of driver mutation and copy number alterations in cancer cells, and quantification of impact of environmental exposure on the mutational landscape. A single cell view of these biological phenomenon will yield new insights into the underlying processes, and the tools developed in this project will be applicable to a wide range of biological and biomedical problems.