# A statistical framework for disease classification with scRNA-Seq data

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA BERKELEY · 2023 · $302,495

## Abstract

Project Summary
Background Single-cell sequencing data has enormous potential to improve our understanding of
human health, with direct applications in the areas of diagnosis and therapeutic selection. Single-
cell sequencing of mRNA expression levels (scRNA-Seq) initially focused on understanding fun-
damental biological systems at the single-cell level, but there is an increasing emphasis on using
scRNA-Seq to understand the role of single-cell variability on human health outcomes. While the
exploration of single-cell human variability and its relationship to disease is advancing, the cor-
responding statistical methodology to handle this type of data at the human population level lags
behind.
Project Objectives Broadly, the long-term goal of this proposal is a coherent methodological
framework for the analysis of the effect of single-cell variability on patient phenotypes. This pro-
posal considers the setting of population scRNA-Seq studies, where scRNA-Seq data is collected
from many patients representing populations with differing health outcomes. The proposed re-
search consists of the development and evaluation of statistical methodologies for these kinds of
scRNA-Seq population studies. The methodology developed by this proposal will ﬁll a critical gap,
helping to unlock the potential of scRNA-Seq data for improving human health.
Project Methods The proposed research program focuses on three speciﬁc aims that target the
most common analysis needs in scRNA-Seq population studies. Aim 1: Patient-level represen-
tation for scRNA-Seq data. This Aim will develop a summary representation of the scRNA-Seq
proﬁle of a patient and create statistical methods that allow comparisons of this summary proﬁle
between different patient populations. Aim 2: Predicting patient phenotypes based on scRNA-Seq
data. This aim will develop models that can predict health phenotypes based on the scRNA-Seq
measurements on a patient. Aim 3: Identifying cell-level and gene-level biomarkers for patient phe-
notypes. The methods developed in this aim will allow for identifying genes and cell populations
that differ at the single-cell level between patient populations. The biomarkers identiﬁed from these
methods will generate testable hypotheses for future exploration of the mechanistic relationship
between single-cell variability and patient outcome.

## Key facts

- **NIH application ID:** 10707488
- **Project number:** 5R01GM144493-02
- **Recipient organization:** UNIVERSITY OF CALIFORNIA BERKELEY
- **Principal Investigator:** Elizabeth Purdom
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $302,495
- **Award type:** 5
- **Project period:** 2022-09-20 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10707488

## Citation

> US National Institutes of Health, RePORTER application 10707488, A statistical framework for disease classification with scRNA-Seq data (5R01GM144493-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10707488. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
