PROJECT SUMMARY Comparative functional genomics offers a powerful framework to study the molecular underpinnings of species- specific traits. Gene regulatory networks (GRNs) which control precise context-specific expression patterns of genes play a significant role in diversifying phenotypes across species. These networks are central to cell type specific function and are often disrupted in many diseases. However, comparison of gene regulatory networks across species has been challenging because of the lack of sufficient number of samples across matched biological contexts. Single cell omic technologies, such as single cell RNA-seq (scRNA-seq) and ATAC-seq (scATAC-seq), are revolutionizing biology enabling researchers to profile the activity of nearly all genomic regions in each individual cell. Single cell omic studies are quickly expanding to multiple species providing unprecedented opportunities to define cell types and their underlying gene regulatory networks and study their evolution. However, computational methods for defining cell-types and cell-specific GRNs across species are in their infancy. In particular, samples in a multi-species scRNA-seq dataset are related by a phylogeny, however, existing integration approaches do not model these relationships. Furthermore, existing approaches are restricted to one-to-one relationships across species, which makes it difficult to study some of the major sources of evolutionary innovation (e.g., duplications) in cell type identity. In this project, we will develop novel computational methods to tackle two problems: (a) defining cell types and their lineage relationships across species from scRNA-seq and scATAC-seq datasets, (b) inference and comparative analysis of cell type-specific GRNs across species from single cell RNA-seq and ATAC-seq data. Our tools will be based on machine learning methods, namely, probabilistic graphical models, multi-task and multi-view learning, and matrix factorization, that offer principled frameworks to integrate information across species. We will first test these tools in human and mouse scRNA-seq/ATAC-seq datasets from our collaborators and published studies. We will demonstrate the full potential of our tools on a novel multi-species kidney scRNA-seq/scATAC-seq dataset that we will collect to study normal kidney function as well as compensatory renal growth, which controls how one kidney recovers after surgical removal of another kidney. We will identify conserved and diverged regulatory networks that will be used to prioritize sequence and protein regulators for validation studies with CRISPR and siRNA. Our analysis will reveal key insights into how GRNs evolve across species and how they establish different cell types. Our approaches and novel datasets will provide critical insight into the molecular programs governing kidney structure and function that could have a significant clinical impact for patients with kidney disease. Our methods will constitute a sui...