Project Description DMS/NIGMS 2: Bayesian Differential Causal Network and Clustering Methods for Single-Cell Data A Significance A.1 Importance of the Problem to Be Addressed Single-cell RNA-sequencing (scRNA-seq) technologies have facilitated new biological discoveries that were impossible with bulk RNA-seq, such as discovering at the single-cell level new gene regulatory activities and cell types. However, in order to translate the fundamental biological knowledge advanced by the scRNA- seq to improved disease diagnosis, treatment, and prevention, new methods are required to comparatively study the molecular differences between normal and pathological cells/tissues, and between control and case/treatment groups. Although identification of differentially expressed genes across two sample groups has been extensively studied, to date, the vast majority of the existing methods for identifying gene regu- latory networks (GRNs) and cell types have, so far, focused on scRNA-seq data generated under a single experimental condition. In principle, these methods can be applied to one experimental condition at a time, based on which post hoc comparisons can be made in order to find the differences caused by experimental interventions. However, compared to joint modeling approaches, this two-step procedure is deemed less efficient and more susceptible to false discoveries due to lack of proper uncertainty propagation from the first step to the second. Moreover, most scRNA-seq network models are correlative in nature and do not infer causal gene regulatory relationships. There is, therefore, a critical need to develop new models for identifying the effects of experimental interventions on causal gene regulation and cell composition by jointly modeling scRNA-seq data across experimental groups. In the absence of such tools, mechanistically un- derstanding gene regulation and cell differentiation, and fully realizing the translational values of scRNA-seq studies will likely remain difficult. A.2 Rigor of Prior Research Aim 1. Many existing scRNA-seq network approaches adapt standard association measures to zero- inflated scRNA-seq data, e.g. Pearson correlation [1] and mutual information [2]. A common limitation of these methods is that they only quantify marginal dependencies, which is susceptible to spurious indirect associations [3]. Graphical models which deal with conditional associations are powerful alternatives to the marginal association measures. Numerous methods have been proposed for general purposes [4, 5] including the development on non-Gaussian data [6–9]. Specifically for scRNA-seq data, two undirected graphical models including Co-I Cai's work [10, 11] were recently proposed based on neighborhood selec- tion which, however, do not infer causal gene regulation. To identify causal relationships, several alternative methods [12, 13] were developed. However, these methods either ignore the count nature of scRNA-seq data, require a known pseudotime (whic...