# Imputing single cell RNA sequencing data: Mathematical, statistical and computational challenges

> **NIH NIH R01** · NORTH CAROLINA STATE UNIVERSITY RALEIGH · 2021 · $1

## Abstract

Novel single cell RNA sequencing (scRNA-seq) technologies can simultaneously measure the expression levels of all
30,000 genes over thousands to millions of individual cells. The analysis of scRNA-seq data has already led to
fundamental advances in biology, including discovery of new cell types, detection of subtle differences between
similar cells, and reconstruction of cellular developmental trajectories. Single- cell measurements involve
amplification of tiny amounts of RNA and result in extremely sparse data matrices with many zeros, While some of
these zeros are due to missing data (dropouts), others represent true biological inactivity. Yet, many scRNA-seq
imputation methods treat all observed zero entries identically, leading to imputed matrices that often overestimate
transcriptional activity. Other methods that do attempt to distinguish biological zeros from dropouts lack rigorous
theoretical guarantees. The goals of this proposal are to develop models, supporting mathematical theory, and
computational tools that explicitly take the existence of true biological zeros into account. Matrix imputation under
this constraint involves both computational challenges as well as theoretical questions in random matrix theory and
high dimensional statistics. These include rank estimation and low rank sparse matrix recovery from partially
observed data, and biclustering in the presence of dropouts and zeros, We plan to develop novel approaches based on
non-smooth continuous optimization, and derive accompanying statistical guarantees, We also plan to develop
ensemble learning approaches that cleverly combine the outputs of multiple imputation algorithms. Finally, we hope
to gain important insights regarding recovery from such data via a study of minimax rates and information lower
bounds. To address these challenges, we will build on our promising preliminary results and the joint expertise of the
investigators in spectral methods, high dimensional statistics, matrix analysis, numerical optimization, and genomics.

## Key facts

- **NIH application ID:** 10242066
- **Project number:** 5R01GM135928-03
- **Recipient organization:** NORTH CAROLINA STATE UNIVERSITY RALEIGH
- **Principal Investigator:** Eric C Chi
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $1
- **Award type:** 5
- **Project period:** 2019-09-23 → 2021-09-02

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10242066

## Citation

> US National Institutes of Health, RePORTER application 10242066, Imputing single cell RNA sequencing data: Mathematical, statistical and computational challenges (5R01GM135928-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10242066. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
