# Leveraging Heterogenous Common Fund Data Sets and Beyond for Identifying Lung Cancer Subtypes

> **NIH NIH R03** · UNIVERSITY OF NEBRASKA MEDICAL CENTER · 2024 · $307,000

## Abstract

Scientific Abstract
As the leading cause of cancer death in the United States, lung cancer accounts for about 20% of all cancer
deaths. While there are two major types of lung cancer (i.e., 80%~85% for non-small cell lung cancer (NSCLC)
and 10%~15% for small cell lung cancer (SCLC)), each type of lung cancer has multiple distinct subtypes
characterized by morphological, molecular, and genetic alterations. Identifying lung cancer subtypes can
facilitate downstream risk stratification and tailored treatment design. While various conventional methods like
morphological analysis, computed tomography (CT) and imaging techniques, cytogenetic analysis,
immunophenotyping, or molecular profiling have been used for lung cancer subtype identification, they are
usually costly, time-consuming, labor-intensive, and sometimes inaccurate. Recent progress has witnessed the
application of next generation sequencing (NGS) for identifying lung cancer subtypes, but they are limited to bulk
NGS data, or single omics data only. With tons of omics data being generated within and beyond the Common
Fund data sets (e.g., GTEx and HuBMAP), we hypothesize that integration of single-cell and bulk multi-omics
data including genomics, transcriptomics, and epigenetics data will significantly facilitate subtype-specific
biomarker discovery and boost the accuracy of lung cancer subtype identification. To address these concerns,
we propose to develop an integrated machine learning (ML) framework for accurate and cost-effective
lung cancer subtype identification by combining single-cell and bulk multi-omics data within and beyond
Common Fund data sets. To achieve this, two specific aims are undertaken. Aim 1, to establish a gene-
signature-transfer ML model that leverages large-scale bulk and single-cell transcriptomics data within and
beyond Common Fund data sets for lung cancer subtype identification. Besides identifying well-annotated lung
cancer subtypes, we will also explore novel lung cancer subtypes by detecting rare cell types from large-scale
single cell data, from which cluster-specific and rare-cell-type specific gene signatures can be transferred to the
bulk transcriptomics data for improving performance of lung cancer subtype identification. Aim 2, to develop a
multi-omics integration framework to systematically combine single-cell and bulk multi-omics data (including
genomics, transcriptomics, epigenomics) to further boost lung cancer subtype identification. Our model is flexible
to tackle cases when only partial or incomplete multi-omics data are available for new patients. We believe
successful completion of this study will have direct impacts on improving downstream lung cancer risk
stratification, facilitating diagnosis and prognosis, and optimizing treatment selection. We also expect that our
proposed framework in this study can be customized and extensible to identifying subtypes of other types of
cancer.

## Key facts

- **NIH application ID:** 10990280
- **Project number:** 1R03OD038391-01
- **Recipient organization:** UNIVERSITY OF NEBRASKA MEDICAL CENTER
- **Principal Investigator:** Shibiao Wan
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $307,000
- **Award type:** 1
- **Project period:** 2024-09-05 → 2026-09-04

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10990280

## Citation

> US National Institutes of Health, RePORTER application 10990280, Leveraging Heterogenous Common Fund Data Sets and Beyond for Identifying Lung Cancer Subtypes (1R03OD038391-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10990280. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*