Project summary/abstract In this application, we request continuation of MH123184, which aimed to understand how genetic variation alters transcription in specific cells and thereby produces psychopathology. Our research developed statistical methods to integrate single cell and tissue-level transcriptomic data. We targeted methods to identify gene communities, defined in terms of cell type and spatiotemporal window, to understand how genes act in concert to confer risk for psychopathology. We also took advantage of an exciting new avenue of research to approach these challenges, namely CRISPR screening. This innovation has emerged as a powerful tool to characterize the effects of genetic perturbations on the entire transcriptome at a single-cell level. Here we propose research covering three related themes, all of which capitalize on CRISPR advancements: (1) develop powerful and well-calibrated tests for the effect of CRISPR perturbations on gene expression by inferring latent factors; (2) develop methods for removing the effect of unmeasured confounders in high throughput screens; and (3) develop methods for imputation and denoising for multiomic data that facilitate downstream testing of omic readouts. Each of these aims is motivated by pressing needs in the field. First, due to small samples and the sparsity of the response variable, it is essential that we enhance the power and interpretability of CRISPR tests by accounting for co- regulation and convergent function of genes. Aim 1 achieves this purpose by estimating latent factors that represent co-regulated genes and by inferring a similarity matrix among gene perturbations. As CRISPR screens advance to more biologically complex settings, such as model organisms, unmeasured confounders will play a more important role, and new methods are needed to control for these effects. Aim 2 develops two approaches to this challenge: an innovative use of negative control variables, as motivated by the causal literature, and key advances to the classic surrogate variable analysis method. For the field to move toward efficient use of multiomic data, data derived from multiple sources will be required. These resources will invariably have missing data. Methods to account for imputation of missing data are needed. Tools developed for variational autoencoders show great promise; however, as described in Aim 3, they need to be paired with semiparametric inference tools to ensure robust and well calibrated downstream analysis. By applying what we learn from these three aims to available resources, most from distributed resources and some from our collaborations, we expect to shed more light on the neurobiological mechanisms of mental illness. We are well positioned to move between theory and data because we have a diverse team of investigators lead by the PI (Roeder), who has decades of experience in statistical genomic field and co-investigators Wasserman and Lei, who are experts in theory and methods for h...