# Semi-supervised cross-modality translation for single-cell genomics and proteomics

> **NIH NIH K99** · UNIVERSITY OF WASHINGTON · 2024 · $129,723

## Abstract

Project Summary/Abstract
Here, we propose a training program with two years of training to prepare the candidate for a successful transition
to independence, in the field of developing machine learning integration algorithms to predict undercharacterized
genomic and proteomic data.
The training plan is designed to guide the candidate's scientific and professional development, under the men-torship of Dr. Noble and guidance from five advisory committee members (Drs. Christine Disteche, Jay Shendure,
Brian Beliveau, Mike MacCoss, and Sheng Wang) at the University of Washington. The committee will help the
candidate extend their knowledge of proteomics, spatial imaging, and machine learning development.
The proposed research focuses on developing semi-supervised machine learning integration tools that can
predict the various types of single-cell profiles (e.g., chromatin accessibility, spatial locations, proteomics) from
known measurements (e.g., gene expression).
In Aim 1, we propose to computationally fill in the gap of single-cell time series snapshots to infer continuous
cellular profile changes, by building a conditional variational autoencoder model with continuous time representations.
The model will enable us to infer temporal maps of single cells in conditions with sparser time points
captured (e.g., a mouse mutational strain collected in another experiment, human embryonic development, chromatin
accessibility measurements).
In Aim 2, we will develop a semi-supervised joint machine learning model stitching together the conditional
variational autoencoder model and graph neural network to predict the physical locations of cells with dissociated
gene expression and chromatin accessibility measurements.
In Aim 3, we will combine the semi-supervised framework with deep tensor factorization and use genomics
and bulk assays to infer single-cell proteomics profiles and identify genome-scale protein markers in single cells
with only gene expression profiles.
The research plan will generate computational tools to project single cells to their spatiotemporal contexts
and understand the protein mediators. The tools will be generally applicable to studies of complex biological
systems (e.g., embryonic development) and diseases (e.g., cancer). With the rapid development in single-cell
time-series, spatial imaging, and proteomics, we expect our methods to have increasing power for biological
knowledge detection.

## Key facts

- **NIH application ID:** 10983900
- **Project number:** 1K99HG013343-01A1
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Ran Zhang
- **Activity code:** K99 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $129,723
- **Award type:** 1
- **Project period:** 2024-09-04 → 2025-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10983900

## Citation

> US National Institutes of Health, RePORTER application 10983900, Semi-supervised cross-modality translation for single-cell genomics and proteomics (1K99HG013343-01A1). Retrieved via AI Analytics 2026-06-24 from https://api.ai-analytics.org/grant/nih/10983900. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
