# Methods for improved detection of activated molecular pathways in cancer

> **NIH NIH F31** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2022 · $40,574

## Abstract

PROJECT SUMMARY/ABSTRACT
Studying tumors by quantifying gene expression via RNA-sequencing (RNA-seq) has proven crucial
to elucidating their active biological pathways and processes, how they differ from normal tissue, and
how they might be targeted for therapy. Furthermore, new single cell RNA-seq (scRNA-seq)
techniques are beginning to uncover the heterogeneity of tumors by profiling them at single cell
resolution. Deriving knowledge of pathway activity from expression data requires the application of
methods such as Gene Set Enrichment Analysis (GSEA), which is a community standard for
assessing the coordinate up- or down-regulation of pathways, processes, and phenotypes
represented by groups of genes or ‘gene sets’. As GSEA requires high-quality and well-annotated
gene sets for a robust analysis, the Mesirov lab maintains and freely distributes the Molecular
Signatures Database (MSigDB), which contains multiple collections of gene sets to accompany our
GSEA software. Ideally, this database would consist of coherent gene sets, that is, sets whose
member genes show coordinate up-regulation or coordinate down-regulation and specifically indicate
activation or repression of a specific pathway or process relevant to a particular cell type or disease
phenotype. However, due to the manner of collection of some gene sets in MSigDB, e.g., curation
from scientific publications or extraction from canonical pathway databases, some of the gene sets
lack coherence. In addition, users of our GSEA implementations are beginning to input new
scRNA-seq data. However, we have identified statistical problems arising from the sparsity of
scRNA-seq data that make standard GSEA results uninterpretable. To address these concerns, we
propose the following aims.
 Aim 1: We will develop a data-driven refinement approach for the gene sets in the MSigDB.
 Our approach will leverage large-scale compendia of expression datasets and protein-protein
 interaction networks to use existing gene sets as starting points to construct refined gene sets.
 Aim 2: We will use the refinement method from Aim 1 to assemble a new Hallmark collection
 of refined gene sets for use in GSEA.
 Aim 3: We will develop and validate an approach to pathway enrichment detection that
 accounts for the sparsity of scRNA-seq.
Following the completion of these aims, we will have released a new, freely available collection of
gene sets that enable more robust GSEA as well as a new method which will allow these new, or any,
gene sets to be used to test for enrichment in scRNA-seq.

## Key facts

- **NIH application ID:** 10380586
- **Project number:** 5F31CA257344-02
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** Alexander Thomas Wenzel
- **Activity code:** F31 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $40,574
- **Award type:** 5
- **Project period:** 2021-04-01 → 2024-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10380586

## Citation

> US National Institutes of Health, RePORTER application 10380586, Methods for improved detection of activated molecular pathways in cancer (5F31CA257344-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10380586. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
