PROJECT SUMMARY Alzheimer's disease (AD) is a devastating neurodegenerative disease that affects 6.2M Americans, yet current therapies are not effective at preventing or slowing the cognitive decline1. Neuropsychiatric symptoms (NPS) are core features of AD and related dementias that are associated with major adverse effects on daily function and quality of life, and accelerate time to institutionalization. The overarching goal of the parent grant R01AG067025 is to integrate single nucleus transcriptome profiles with detailed NPS phenotype data from each donor and identify dysregulated genes associated with disease trajectory, identify clusters of donors with different gene expression disease signatures, and nominate genes and pathways for targeting with novel therapeutics. The compendium of single nucleus transcriptome profiles comprising ~7.2M nuclei from ~1,800 total donors generated by the parent grant R01AG067025 is a remarkable resource. Yet mining these transcriptome profiles to advance knowledge of AD etiology requires analytical workflows that scale to the unprecedented size of these and other emerging data. Existing workflows for multi-donor single cell and nucleus transcriptome data have either been 1) designed for a small number of donors and so cannot take advantage of the large-scale and complex study design used here, or 2) adapted from bulk transcriptome analyses and do not currently scale to hundreds of donors, dozens of cell types and millions of cells. The objective of addressing pressing biological hypotheses about AD biology necessitates the development of analytical workflows designed and engineered with the challenges of multi-donor single cell and nucleus transcriptome data in mind. In this Supplement, we propose developing a scalable, open source analytical workflow for multi-donor single cell/nucleus transcriptome data motivated by our previous work on linear mixed models2,3. We have previously applied linear mixed models to analyze bulk transcriptome profiles, and developed the open source variancePartition package to perform differential expression testing, account for technical batch effects and characterize the multiple biological and technical sources of expression variation. While the current software has facilitated analysis of bulk transcriptomic and epigenomic profiles by our group and many others, applying it to the multi-donor single nucleus data is currently limited by the ad hoc design of the variancePartition codebase. To address these limitations, here we propose (Aim 1) Scaling this analytical workflow to emerging datasets using best practices in software engineering, code refactoring, and empirical testing across multiple computing environments; and (Aim 2) Enabling broader use by (a) computational biologists by developing vignettes to illustrate applications of the software on public datasets, and by (b) open source developers by improving code design and documentation. Overall, reconceiving the analyti...