# Efficient Methods for Dimensionality Reduction ofSingle-Cell RNA-Sequencing Data

> **NIH NIH F30** · YALE UNIVERSITY · 2021 · $30,846

## Abstract

Project Summary: Efficient Methods for Dimensionality Reduction of Single-Cell RNA-Sequencing Data
Single-cell RNA-sequencing is a revolutionary technology enabling discoveries in human physiology and
disease. The datasets generated from single-cell RNA-sequencing experiments are so large that they cannot be
analyzed or visualized using traditional statistical methods until the datasets have been shrunk using a
technique named “dimensionality reduction.” Almost every analysis of single-cell RNA-sequencing begins
using a technique named principal component analysis (PCA) to accomplish dimensionality reduction.
However, single-cell RNA-sequencing presents unique challenges making PCA difficult. First, the size of these
datasets is so large that computing PCA requires specialized hardware and multiple hours. Fast algorithms to
approximate PCA have been shown to dramatically speed up this process, but have not proliferated in the
single cell-RNA sequencing community, in part because no parallelized algorithm has been written in the R
computing language. Second, PCA requires the researcher to decide the final desired size of the dataset.
Choosing too small of a size results in discarding valuable biological insights, while choosing too large a size
increases the noise. However, there is no consensus on how to pick the optimal size for single-cell RNA
sequencing, and there is evidence that this size might be systematically underestimated. Lastly, PCA cannot be
applied directly to the count-data measured in single cell RNA sequencing, so researchers must first apply a
preprocessing technique to normalize it. The current standard in the field is to apply the log transform –
however, several recent studies have shown that the log transform creates statistical biases in single-cell RNA
sequencing. In this fellowship, specifically tailored, fast methods for performing PCA on single-cell RNA-
sequencing data will be developed: 1a) A framework to rigorously measure the consequence of changing
preprocessing parameters on the final results of several publicly available single cell RNA sequencing datasets
to enable experimentation of PCA on single-cell RNA-sequencing data. 1b) An ultra-fast, parallelized
implementation of randomized PCA allowing researchers using standard laptops to rapidly perform PCA on
single cell RNA sequencing data. 2) A technique for rigorously choosing the final size when performing
principal component analysis for single-cell RNA-sequencing datasets. 3) A method for transforming single-cell
RNA-sequencing data so that it becomes appropriately distributed enabling proper usage of PCA without
incurring statistical biases. This fellowship also includes a detailed training plan with valuable learning
experiences for the applicant’s development as a physician-scientist who can apply methods from high
dimensional-statistics to solving biomedical problems.

## Key facts

- **NIH application ID:** 10124992
- **Project number:** 5F30HG011193-02
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** James Michael Garritano
- **Activity code:** F30 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $30,846
- **Award type:** 5
- **Project period:** 2020-03-16 → 2023-03-15

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10124992

## Citation

> US National Institutes of Health, RePORTER application 10124992, Efficient Methods for Dimensionality Reduction ofSingle-Cell RNA-Sequencing Data (5F30HG011193-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10124992. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
