Semi-supervised cross-modality translation for single-cell genomics and proteomics

NIH RePORTER · NIH · K99 · $129,723 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract Here, we propose a training program with two years of training to prepare the candidate for a successful transition to independence, in the field of developing machine learning integration algorithms to predict undercharacterized genomic and proteomic data. The training plan is designed to guide the candidate's scientific and professional development, under the men-torship of Dr. Noble and guidance from five advisory committee members (Drs. Christine Disteche, Jay Shendure, Brian Beliveau, Mike MacCoss, and Sheng Wang) at the University of Washington. The committee will help the candidate extend their knowledge of proteomics, spatial imaging, and machine learning development. The proposed research focuses on developing semi-supervised machine learning integration tools that can predict the various types of single-cell profiles (e.g., chromatin accessibility, spatial locations, proteomics) from known measurements (e.g., gene expression). In Aim 1, we propose to computationally fill in the gap of single-cell time series snapshots to infer continuous cellular profile changes, by building a conditional variational autoencoder model with continuous time representations. The model will enable us to infer temporal maps of single cells in conditions with sparser time points captured (e.g., a mouse mutational strain collected in another experiment, human embryonic development, chromatin accessibility measurements). In Aim 2, we will develop a semi-supervised joint machine learning model stitching together the conditional variational autoencoder model and graph neural network to predict the physical locations of cells with dissociated gene expression and chromatin accessibility measurements. In Aim 3, we will combine the semi-supervised framework with deep tensor factorization and use genomics and bulk assays to infer single-cell proteomics profiles and identify genome-scale protein markers in single cells with only gene expression profiles. The research plan will generate computational tools to project single cells to their spatiotemporal contexts and understand the protein mediators. The tools will be generally applicable to studies of complex biological systems (e.g., embryonic development) and diseases (e.g., cancer). With the rapid development in single-cell time-series, spatial imaging, and proteomics, we expect our methods to have increasing power for biological knowledge detection.

Key facts

NIH application ID
10983900
Project number
1K99HG013343-01A1
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
Ran Zhang
Activity code
K99
Funding institute
NIH
Fiscal year
2024
Award amount
$129,723
Award type
1
Project period
2024-09-04 → 2025-06-30