# Finding emergent structure in multi-sample biological data with the dual geometry of cells and features

> **NIH NIH R01** · MICHIGAN STATE UNIVERSITY · 2020 · $353,150

## Abstract

A fundamental question in biomedical data analysis is how to capture biological heterogeneity and characterize the
complex spectrum of health states (or disease conditions) in patient cohorts. Indeed, much effort has been invested in
developing new technologies that provide groundbreaking collections of genomic information at a single cell
resolution, unlocking numerous potential advances in understanding the progression and driving forces of biological
states. However, these new biomedical technologies produce large volumes of data, quantified by numerous
measurements, and often collected in many batches or samples (e.g., from different patients, locations, or times).
Exploration and understanding of such data are challenging tasks, but the potential for new discoveries at a level
previously not possible justifies the considerable effort required to overcome these difficulties.
 In this project we focus on multi-sample single-cell data, e.g., from a multi-patient cohort, where data points
represent cells, data features represent gene expressions or protein abundances, and samples (e.g., considered as
separate batches or datasets) represent patients. We consider a duality or interaction between constructing an
intrinsic geometry of cells (e.g., with manifold learning techniques) and processing data features as signals over it
(e.g., with graph signal processing techniques). We propose the utilization of this duality for several data exploration
tasks, including data denoising, identifying noise-invariant phenomena, cluster characterization, and aligning cellular
features over multiple datasets. Furthermore, we expect the dual multiresolution organization of data points and
features to allow us to compute aggregated signatures that represent patients, and then provide a novel data
embedding that reveals multiscale structure from the cellular level to the patient level.
 The proposed research combines recent advances in several fields at the forefront of data science, including
geometric deep learning, manifold learning, and harmonic analysis. The methods developed in this project will provide
novel advances in each of these fields, while also establishing new relations between them. Furthermore, the
challenges addressed by these methods are a foundational prerequisite for new advances in genomic research, and
more generally in empirical data analysis where data is collected in varying experimental environments. The
developed algorithms and methods in this project will be validated in several biomedical settings, including
characterizing Zika immunity in Dengue patients, tracking progress of Lyme disease, and predicting the effectiveness
of immunotherapy.

## Key facts

- **NIH application ID:** 10022130
- **Project number:** 5R01GM135929-02
- **Recipient organization:** MICHIGAN STATE UNIVERSITY
- **Principal Investigator:** Matthew John Hirn
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $353,150
- **Award type:** 5
- **Project period:** 2019-09-23 → 2023-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10022130

## Citation

> US National Institutes of Health, RePORTER application 10022130, Finding emergent structure in multi-sample biological data with the dual geometry of cells and features (5R01GM135929-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10022130. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
