# Collaborative Research: Advanced statistical methods for single cell RNA sequencing studies

> **NIH NIH R01** · UNIVERSITY OF CHICAGO · 2021 · $314,378

## Abstract

Single cell RNA sequencing has emerged as a powerful tool in genomics and has been used in a wide
variety of applications, providing unprecedented insights into many basic biological questions that are
previously difficult to address. However, analyzing scRNAseq data face important statistical and
computational challenges that require the development of new computational and statistical methods.
Key challenges include: (1) lack of robust statistical methods that can control for hidden confounding
effects in a range of settings; (2) lack of accurate cell subpopulation clustering methods that are
tailored to scRNAseq studies; and (3) difficulty in identifying functional genetic variations with scRNAseq
alone and difficulty in integrating scRNAseq with other genetic studies include genome-wide association
studies. Our proposed methods will address these challenges and are innovative in the following aspects: (1)
our method for controlling for hidden confounding effects bridges between two existing classes of statistical
methods for removing confounding effects and is thus expected to perform robustly across a range of
scenarios; (2) our method for clustering cell subpopulations extracts clustering information from a lowdimensional
representation of scRNAseq data and is thus expected to produce accurate results even when
the original high-dimensional gene expression matrix is noisy; and (3) our method for identifying allele
specific/biased expression using scRNAseq data alone represents the first such attempt and our method for
integrating scRNAseq with GWASs also represents the first such attempt. All our proposed methods are
tailored to scRNAseq data and will cope with the complexities and unique features of scRNAseq data,
including, but not limited to, low-coverage, count nature, and drop-out events. We will develop, distribute,
and support user-friendly open-source software implementing our methods to benefit the genomics and
statistics community. The statistical methods developed here will pave ways for developing similar methods
to other sequencing studies including bisulfite sequencing and ATAC-seq studies. The proposed methods
are essential for understanding the heterogeneity of tissue compositions and the genetic architecture of
complex traits and diseases - both are questions of central importance to human health.

## Key facts

- **NIH application ID:** 10155503
- **Project number:** 5R01GM126553-05
- **Recipient organization:** UNIVERSITY OF CHICAGO
- **Principal Investigator:** Mengjie Chen
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $314,378
- **Award type:** 5
- **Project period:** 2017-08-01 → 2023-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10155503

## Citation

> US National Institutes of Health, RePORTER application 10155503, Collaborative Research: Advanced statistical methods for single cell RNA sequencing studies (5R01GM126553-05). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10155503. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
