# Developing tools for the unbiased analysis and visualization of scRNA-seq data

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA LOS ANGELES · 2021 · $296,743

## Abstract

ABSTRACT
Single-cell RNA sequencing (scRNA-seq) provides genome-wide information about gene expression at the
resolution of individual cells. The unprecedented scope of these data is revolutionizing our understanding of
development and tissue homeostasis as well as diseases like cancer. A major issue with scRNA-seq, however,
is the shear scale of the data, consisting of ~20,000 gene expression measurements in thousands to millions
of cells. Effective computational approaches are clearly required to translate data of this size and complexity
into actionable biological insights. For instance, scRNA-seq data are approximately 20,000-dimensional, and
as a result all available analysis pipelines rely on multiple dimensionality reduction steps. This usually entails a
combination of linear tools like PCA and non-linear techniques like t-SNE and UMAP. The data is generally
reduced to between 10- and 100-D for data analysis (e.g. clustering into distinct cell types) and 2-D for
visualization. The problem, however, is that dimensionality reduction can lead to loss of information. We
recently showed that this loss of information is dramatic: for any given cell, over 95% of its neighbors are
changed in the process of dimensionality reduction. This complete change in the structure of the data can
introduce significant noise and bias into the analysis, and suggests the critical need for alternative approaches.
The premise of this application is that reducing bias in scRNA-seq data analysis will maximize our ability to
extract meaningful information from the data. In this proposal, we focus on developing new algorithms to
address three specific steps in the typical analysis pipeline: (1) Dimensionality Reduction: Our hypothesis is
that deep neural networks can be explicitly trained to maximize the amount of information that can be retained
for both data analysis and visualization. (2) Feature Selection: Not all genes are equally informative for
downstream analyses, so researchers generally choose a subset of genes based on variation in the
population. We have shown that standard approaches to selecting genes convolve true biological variation with
technical noise from the experiment. We hypothesize that statistical models based on our understanding of
sources of technical noise can be used to select more informative genes. (3) Cell clustering: Clustering the
data to determine cell types is critical, but cells with different identities often form complex, overlapping
geometries in gene expression space that are difficult for existing algorithms to resolve. Our hypothesis is that
new clustering tools, guided by prior knowledge and leveraging innovations in clustering from image
segmentation, can overcome this problem. We will build these new tools and test them against existing
benchmark datasets and novel data generated by our experimental collaborators. We will also integrate these
tools into popular scRNA-seq analysis packages. Successful completion of the pro...

## Key facts

- **NIH application ID:** 10279320
- **Project number:** 1R01GM143378-01
- **Recipient organization:** UNIVERSITY OF CALIFORNIA LOS ANGELES
- **Principal Investigator:** Eric J Deeds
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $296,743
- **Award type:** 1
- **Project period:** 2021-09-01 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10279320

## Citation

> US National Institutes of Health, RePORTER application 10279320, Developing tools for the unbiased analysis and visualization of scRNA-seq data (1R01GM143378-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10279320. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*