# Resolving single-cell brain regulatory elements with bulk data supervised models

> **NIH NIH R01** · J. DAVID GLADSTONE INSTITUTES · 2021 · $606,575

## Abstract

Gene regulation is an important determinant of the complex specialization of cells in the human brain, and
nucleotide changes within regulatory elements contribute to risk for psychiatric disorders. We therefore
hypothesize that these debilitating diseases are driven in part by genetic variants that alter gene expression and
disturb the balance and function of cell types in brain tissue. Single-cell open chromatin assays are a promising
approach to testing this hypothesis by mapping variants to regulatory elements specific to and shared across
cell populations. There are two major barriers to this strategy, for which our project proposes modeling solutions.
First, despite being the best assay currently, single-cell ATAC-sequencing (scATAC-seq) suffers from low
resolution, meaning that an open chromatin region may be supported by zero or few reads in a given cell. This
makes it hard to identify coherent cell populations. We propose a network model for semi-supervised clustering
of cells in scATAC-seq that leverages information from higher-coverage bulk tissue experiments and single-cell
RNA-sequencing (scRNA-seq), if available. The expected outcomes from applying this model to compendia of
brain data from public repositories and our collaborators are (i) identification of open chromatin regions that
differentiate cell types and states, and (ii) discovery of resolved cell populations whose open chromatin is
enriched for psychiatric disorder associated genetic variants. These results alone may not be enough to develop
a mechanistic understanding of how variants impact brain function. To address this second challenge, we will
implement a computationally efficient, machine-learning framework for predicting the specific regulatory
functions of single-cell open chromatin regions from our network model and other approaches. Gene regulatory
enhancers are particularly amenable to this approach, because high-throughput mouse transgenics and
massively parallel reporter assays have generated enough validated enhancers for supervised learning. Our
framework will be easy to apply to other regulatory functions, such as insulating boundaries in chromatin capture
data. By developing a compressed, yet flexible, featurization of massive bulk and single-cell data compendia,
we will enable rapid iteration with computationally intensive prediction algorithms to be applied to single-cell open
chromatin regions. Our approach will also incorporate transfer learning from data-rich (e.g., postmortem or
mouse brains) to data-poor settings (e.g., human late-gestation brains). We expect predicted regulatory elements
to be more enriched for psychiatric disorder genetic risk, to provide mechanistic insight regarding how variants
cause disease, and to be useful molecular tools. Together our two proposed computational approaches will
leverage the complementary strengths of bulk and single-cell data to resolve regulatory elements that drive the
exquisite diversity of cells in devel...

## Key facts

- **NIH application ID:** 10144503
- **Project number:** 5R01MH123178-02
- **Recipient organization:** J. DAVID GLADSTONE INSTITUTES
- **Principal Investigator:** KATHERINE S. POLLARD
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $606,575
- **Award type:** 5
- **Project period:** 2020-04-15 → 2024-02-29

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10144503

## Citation

> US National Institutes of Health, RePORTER application 10144503, Resolving single-cell brain regulatory elements with bulk data supervised models (5R01MH123178-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10144503. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
