# Causal Representation Learning for the Spatial Analysis of Transcriptomic and Imaging Data in Tissue Contexts

> **NIH NIH DP2** · BROAD INSTITUTE, INC. · 2022 · $1,378,800

## Abstract

NIH New Innovators Award
Abstract
By melding imaging and genomics it is now possible to obtain spatially resolved transcriptomic
datasets; however, computational methods for analyzing such datasets have lagged behind
experimental developments. To realize the full potential of spatial transcriptomic (ST) data, we
cannot rely on the methods that have been developed for analyzing single cell data that divorce
cells from their microenvironment. As with experimental developments that saw ST
breakthroughs by melding imaging and sequencing, we argue that the same will hold true in the
computational domain, and, therefore, propose a framework for the analysis of this data that
integrates imaging and sequencing with causality to infer regulatory mechanisms underlying
spatially driven processes.
We propose to achieve this through an innovative unification of two vibrant areas in machine
learning (ML); representation learning and causal inference. This is a momentous task since
representation learning, although successful in predictive tasks like recommender systems,
does not generally elucidate causal relationships. To overcome this, we will use representation
learning to identify correlations that are present in all data modalities available in ST, and
thereby discern spurious correlations from causal ones using the principle of invariance. In
addition, we will build on three fundamental concepts in ML:
- Image inpainting: to identify motifs in tissue architecture as well as anomalous tissue patches
- Optimal transport: to infer tissue lineages from snapshots in time
- Causal structure discovery: to identify regulatory modules & predict the effect of perturbations
This unification will result in an ML framework that integrates space, time, and expression to
identify biological mechanisms underlying spatial processes. Although this framework will be
broadly applicable, it is centered on three disease contexts, which will serve as the foreground
to test and refine our methods and for which ST data have already been obtained:
- Inflammation/fibrosis in the gut; to study cell recruitment, matrix deposition, and clearance;
- Alzheimer's disease; to study questions of secretion and protein aggregation; and
- Classic Hodgkin lymphoma; to study tumor-immune cell interactions & immunological invasion.
Understanding the regulatory mechanisms of cell-cell communication in these disease contexts
has the potential to give rise to new therapeutic targets that could be validated in partnership
with our experimental collaborators and benefit patients' lives.

## Key facts

- **NIH application ID:** 10471669
- **Project number:** 1DP2AT012345-01
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Caroline Uhler
- **Activity code:** DP2 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $1,378,800
- **Award type:** 1
- **Project period:** 2022-09-15 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10471669

## Citation

> US National Institutes of Health, RePORTER application 10471669, Causal Representation Learning for the Spatial Analysis of Transcriptomic and Imaging Data in Tissue Contexts (1DP2AT012345-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10471669. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
