# Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Single Cell RNA-Sequencing Data

> **NIH NIH F30** · YALE UNIVERSITY · 2020 · $50,520

## Abstract

Project Summary: Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Single Cell
RNA-Sequencing data
Single cell RNA-sequencing (scRNA-seq) is revolutionizing the study of gene expression. In contrast to bulk RNA-
sequencing, where the average expression of all cells in a sample is measured, scRNA-seq allows researchers to
measure gene expression in each cell individually. This new technology has profound implications for both basic
and clinical research but also presents unique analytic challenges. Among them is the problem of scale: scRNA-
seq datasets are growing exponentially in size, with recently developed droplet-based technologies already
profiling over 1 million cells in a single experiment. When applied to data of this scale, the most common
methods for dimensionality reduction and visualization of scRNA-seq data, principal component analysis (PCA)
and t-distributed Stochastic Neighborhood Embedding (t-SNE), require many hours of processing on servers
with large amounts of memory. Furthermore, scRNA-seq technologies attempt to measure the extremely small
amount of RNA in individual cells, resulting in a phenomenon called “dropout,” in which a gene is expressed but
not detected and hence incorrectly measured as being unexpressed. In this fellowship, specifically tailored and
highly scalable analysis methods for scRNA-seq data will be developed: 1) An ultra-fast, out-of-core
implementation of randomized PCA allowing for anyone with a standard laptop to perform PCA of even the
largest datasets. 2) An improved implementation of t-SNE that incorporates recent theoretical results and that
will also use a numerical approximation called fast multipole methods to dramatically accelerate its runtime. 3)
A method for imputing “dropped out” gene expression using recent results from the theory of low-rank matrix
completion. In summary, this research will provide practical tools for analysis and visualization of scRNA-seq
data. The fellowship also includes a training plan with valuable learning experiences for the applicant’s
development as a physician-scientist who can apply computational and mathematical methods to solving
biomedical problems.

## Key facts

- **NIH application ID:** 9876960
- **Project number:** 5F30HG010102-03
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** George Linderman
- **Activity code:** F30 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $50,520
- **Award type:** 5
- **Project period:** 2018-03-01 → 2021-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9876960

## Citation

> US National Institutes of Health, RePORTER application 9876960, Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Single Cell RNA-Sequencing Data (5F30HG010102-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9876960. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
