Computational tools for estimating cell-type-specific effects in bulk RNA-seq and spatial transcriptomics data, using reference single-cell RNA-seq datasets

NIH RePORTER · NIH · R35 · $436,750 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY / ABSTRACT RNA-seq is a powerful tool for studying molecular biology. However, without cell sorting (or related techniques), conventional RNA-seq applied to tissue samples cannot determine gene expression in underlying cell-types. This is problematic because differential gene expression observed at the tissue level is not necessarily reflected in underling cell-types, which obscures biological insight. For example, Schmiedel et al. recently applied RNA- seq to 13 purified blood cell-types from 106 individuals1, which uncovered the molecular basis of sex-specific differences in immune response. However, this was obscured when they applied RNA-seq to only whole-blood. Single-cell RNA-seq is the obvious candidate to probe cell-type-specific effects more broadly. However, for most tissues, single-cell RNA-seq has been restricted to small sample sizes, due to specialized dissociation protocols and cost. Thus, only bulk-tissue RNA-seq data are available for large sample sizes. Crucially, much of these bulk data are paired to enormous stores of informative clinical phenotypic data and additional -omics data. These datasets include large NIH initiatives such as GTEx, TCGA, and All of Us, which have collected data on genetics, disease status, outcome, drug treatments, ethnicity, sex, and much more. The critical gap is that we cannot currently study the relationship between cell-type level gene expression and any of these phenotypes. To overcome this limitation, we will develop computational tools for estimating cell-type-specific differential expression from bulk RNA-seq data, when a small reference single-cell RNA-seq dataset is available from the same tissue-type. This will allow us to study the cell-type-specific differences in expression that drive human phenotypes and diseases, unlocking the tens-of-thousands of bulk RNA-seq samples paired to phenotypic data. The basis for this research program is a previous study where we developed a method to recover the cell-type- specific effects of inherited genetic variation on gene expression in bulk breast-tumor RNA-seq data. This method allowed us to discover a novel breast cancer risk gene—which was obscured using conventional methods. Here, we posit that a similar mathematical framework can be adapted to recover any cell-type-specific effect from bulk-tissue RNA-seq. Hence, we can develop specific tools to perform multiple commonly applied analyses at cell-type-specific resolution from bulk-tissue RNA-seq by leveraging matched single-cell data, including differential expression, correlative and gene set enrichment analysis. Finally, new spatial transcriptomics technologies are emerging that enable spatially resolved gene expression to be measured directly in tissue sections. These platforms quantify gene expression in situ in ~100μm barcoded spots. Each spot captures a small cluster of cells—akin to a miniaturized bulk-tissue RNA-seq experiment. Hence, the same abstract mathematical framewo...

Key facts

NIH application ID
10407563
Project number
5R35GM138293-03
Recipient
ST. JUDE CHILDREN'S RESEARCH HOSPITAL
Principal Investigator
Paul Geeleher
Activity code
R35
Funding institute
NIH
Fiscal year
2022
Award amount
$436,750
Award type
5
Project period
2020-08-01 → 2025-05-31