PROJECT SUMMARY This proposal outlines a five-year research and career development program aimed at building computational frameworks for understanding the phenotypic effects of perturbations and somatic alterations in cancer. The application is heavily based on the candidate’s extensive PhD training in Carnegie Mellon University’s world- renowned Computer Science Department. It is also grounded in the candidate’s rich prior experience working as an Associate Computational Biologist at the Broad Institute, and his large network of top-level physicians and scientists in the cancer field. It also leverages his current postdoctoral appointment under Dr. Gad Getz at the Broad Institute, and the unique set of resources, facilities, collaborations and expertise in this institute. Along with a series of relevant didactics and career building activities, these studies will form the basis of his transition to an independent tenure track position as a scientist guided by the goal of enabling long-term modeling and understanding of cancer as a disease. The large-scale availability of next-generation sequencing data for cancer has offered an unprecedented characterization of somatic changes that happen in this disease. Understanding their combinatorial phenotypic effects is still an open problem, and powerful in vitro perturbation protocols have been designed to experimentally probe these effects. However, the search space for possible combinations of perturbations to screen is prohibitively large. The objective of this work is to provide principled Artificial Intelligence (AI)-driven methodology for inferring the effects of perturbations and observed somatic alterations in cancer, a crucial step in understanding the mechanisms. The proposed work draws on recent development in the technical fields of machine learning and causal discovery. In particular, two Specific Aims will be evaluated: (Aim 1) inferring causal graphs from single-cell RNA-seq (with the option of pairing it with whole-exome/whole-genome sequencing); (Aim 2) using a deep generative model, along with paired whole- exome/whole-genome sequencing, to learn latent underlying factors of variation in single-cell RNA-seq. The proposed work also includes steps to validate these computational aims. When completed, this work will advance the field via algorithms/resources that can be used to: (1) use causal knowledge to computationally select combinations of targets to test in the lab; and (2) computationally infer the effects of somatic DNA alterations of interest on expression, leading to improved downstream experiment design. Therefore, put together, the proposed aims are a crucial step in understanding mechanisms in cancer, and will lead to significant progress towards efficiently discovering drugs for this disease.