# Statistical Methods for RNA-seq Data Analysis

> **NIH NIH R01** · FRED HUTCHINSON CANCER CENTER · 2024 · $437,166

## Abstract

Project Summary/Abstract
Single cell RNA-seq (scRNA-seq) data have revolutionized our understanding of biology at cell level. Spatial
transcriptomics further moves the field forward by providing spatial context of gene expression. These exciting
techniques have been applied in many basic science or clinical research projects to understand living systems
or the biological basis for disease diagnosis, treatment, and prevention. The data generated by scRNA-seq or
spatial transcriptomics typically has high dimension (the number of genes) and large sample size (the number
of cells or spatial spots). Many biological processes underlying the observed gene expression data are likely
non-linear functions of high dimensional gene expression data. Large sample size combined with non-linear
signals of high dimensional data makes deep learning an appropriate tool to analyze scRNA-seq or spatial
transcriptomics data. Earlier deep learning works on scRNA-seq or spatial transcriptomics focus on un-
supervised tasks, such as de-noising or clustering. For many biomedical applications, a natural next step is
supervised analysis, e.g., comparing scRNA-seq or spatial transcriptomics between two conditions. There are
much fewer works in this direction where deep learning methods face two general challenges: interpretability
and noisy labels of single cells. In this project, we aim to address the interpretability challenge by a flexible
method to incorporate gene annotation into deep learning. To work with single cells with noisy labels, we
propose a mixture model that iteratively refines cell labels and the neural network that predicts cell labels. Our
work on spatial transcriptomics focuses on using these data to train deep learning models to interpret
histological images, particularly H&E stained histological images. Our method provides spatial annotation of
histological images in terms of cell type proportions and interactions between any two cell types. Histological
images are universally available in many clinical settings. In contrast, spatial transcriptomics is harder to scale
due to cost and logistic challenges. Our method enables the transfer of knowledge from spatial transcriptomics
to histological images. Once trained by an appropriate training dataset with both spatial transcriptomics and
histological images, our method can be applied to analyze datasets with only histological images and assess
their associations with phenotypic or clinical outcomes. In summary, our computation methods address
fundamental questions on scRNA-seq or spatial transcriptomics data analysis and they are applicable for most
basic science or clinical research projects that produce relevant data.

## Key facts

- **NIH application ID:** 10903801
- **Project number:** 5R01GM105785-11
- **Recipient organization:** FRED HUTCHINSON CANCER CENTER
- **Principal Investigator:** Wei Sun
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $437,166
- **Award type:** 5
- **Project period:** 2014-05-15 → 2027-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10903801

## Citation

> US National Institutes of Health, RePORTER application 10903801, Statistical Methods for RNA-seq Data Analysis (5R01GM105785-11). Retrieved via AI Analytics 2026-06-01 from https://api.ai-analytics.org/grant/nih/10903801. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
