Leveraging Heterogenous Common Fund Data Sets and Beyond for Identifying Lung Cancer Subtypes

NIH RePORTER · NIH · R03 · $307,000 · view on reporter.nih.gov ↗

Abstract

Scientific Abstract As the leading cause of cancer death in the United States, lung cancer accounts for about 20% of all cancer deaths. While there are two major types of lung cancer (i.e., 80%~85% for non-small cell lung cancer (NSCLC) and 10%~15% for small cell lung cancer (SCLC)), each type of lung cancer has multiple distinct subtypes characterized by morphological, molecular, and genetic alterations. Identifying lung cancer subtypes can facilitate downstream risk stratification and tailored treatment design. While various conventional methods like morphological analysis, computed tomography (CT) and imaging techniques, cytogenetic analysis, immunophenotyping, or molecular profiling have been used for lung cancer subtype identification, they are usually costly, time-consuming, labor-intensive, and sometimes inaccurate. Recent progress has witnessed the application of next generation sequencing (NGS) for identifying lung cancer subtypes, but they are limited to bulk NGS data, or single omics data only. With tons of omics data being generated within and beyond the Common Fund data sets (e.g., GTEx and HuBMAP), we hypothesize that integration of single-cell and bulk multi-omics data including genomics, transcriptomics, and epigenetics data will significantly facilitate subtype-specific biomarker discovery and boost the accuracy of lung cancer subtype identification. To address these concerns, we propose to develop an integrated machine learning (ML) framework for accurate and cost-effective lung cancer subtype identification by combining single-cell and bulk multi-omics data within and beyond Common Fund data sets. To achieve this, two specific aims are undertaken. Aim 1, to establish a gene- signature-transfer ML model that leverages large-scale bulk and single-cell transcriptomics data within and beyond Common Fund data sets for lung cancer subtype identification. Besides identifying well-annotated lung cancer subtypes, we will also explore novel lung cancer subtypes by detecting rare cell types from large-scale single cell data, from which cluster-specific and rare-cell-type specific gene signatures can be transferred to the bulk transcriptomics data for improving performance of lung cancer subtype identification. Aim 2, to develop a multi-omics integration framework to systematically combine single-cell and bulk multi-omics data (including genomics, transcriptomics, epigenomics) to further boost lung cancer subtype identification. Our model is flexible to tackle cases when only partial or incomplete multi-omics data are available for new patients. We believe successful completion of this study will have direct impacts on improving downstream lung cancer risk stratification, facilitating diagnosis and prognosis, and optimizing treatment selection. We also expect that our proposed framework in this study can be customized and extensible to identifying subtypes of other types of cancer.

Key facts

NIH application ID: 10990280
Project number: 1R03OD038391-01
Recipient: UNIVERSITY OF NEBRASKA MEDICAL CENTER
Principal Investigator: Shibiao Wan
Activity code: R03
Funding institute: NIH
Fiscal year: 2024
Award amount: $307,000
Award type: 1
Project period: 2024-09-05 → 2026-09-04