Novel geometric deep learning models for tissue structure-aware spatial expression representations from spatially resolved single-cell transcriptomics data

NIH RePORTER · NIH · R21 · $198,750 · view on reporter.nih.gov ↗

Abstract

Project Summary The continuing advancement of single-cell technologies has ushered us into an exciting era of single-cell spatially resolved transcriptomics (scSRT). scSRT by in-situ sequencing (ISS) or multiple rounds of barcode-based hybridization (BCH) can quantify the 2D and even 3D positions of transcripts from hundreds and thousands of genes for individual cells in intact tissues. Emerging applications of scSRT have demonstrated new capabilities to characterize transcriptional complexity associated with tissue heterogeneity and cellular microenvironment in both physiological and pathological contexts. To fully harness the potential of scSRT, innovative computational tools that can leverage the spatial information of cells and transcripts to tackle the rising challenges are needed. The goal of this application is to develop innovative models that enable the use of scSRT data to identify tissue structures and pathologies, provide the underlying spatial cellular and molecular signatures associated with the pathologies, and discover novel pathological manifestations in previously uncharacterized or new diseases. We have previously analyzed normal and COVID-19 patient lung tissue samples using ISS. We found that popular algorithms for spatial expression clustering based on graph neural networks (GNN) could not capture tissue structure or COVID-19 pathology. To properly model spatial expression domains consistent in tissue histology, we hypothesize that Graph Deep Learning (GDL) models could learn structure-aware spatial patterns that capture histology and gene expression signatures from scSRT data. We further hypothesize that a semi- supervised strategy analogous to semantic image segmentation that utilizes partial annotations would enable GDL to define the heterogeneity of pathological regions. To test these hypotheses, we have collected and processed multiple scSRT datasets from different technologies measuring spatial expressions in both normal and disease conditions in various tissues. In this project, we propose to develop a contrastive learning-based geometric graph attention model to learn tissue geometry-aware gene expression representations (Aim 1) and a semi-supervised node classification on the geometric graph to segment tissue pathology domains from spatial gene expression with few annotations (Aim 2). We will systematically evaluate the model performances by comparing them against carefully annotated histology regions using the collected datasets. The developed models can be used to discover novel pathological manifestations in diseases, particularly in previously uncharacterized or new diseases and provide the spatial cellular and molecular signatures underlying the pathologies. As scSRT is anticipated to revolutionize the study of cellular biology and disease pathology, the proposed models will have a transformative impact on the computation and machine learning methods for SRT analyses.

Key facts

NIH application ID: 10952364
Project number: 1R21GM155774-01
Recipient: UNIVERSITY OF PITTSBURGH AT PITTSBURGH
Principal Investigator: Shou-Jiang Gao
Activity code: R21
Funding institute: NIH
Fiscal year: 2024
Award amount: $198,750
Award type: 1
Project period: 2024-09-01 → 2026-08-31