Towards Foundational 3D In Silico Models of Whole Mouse Embryogenesis

NIH RePORTER · NIH · DP2 · $1,312,485 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY A decade ago, with the advent of next-generation sequencing of the human pathogen Mycoplasma genitalium, Karr et al. reported the first whole-cell model that synthesizes diverse mathematical approaches to predict a broad spectrum of biological processes. Given the recent advancements in single-cell and spatial genomics, along with the amassed cell atlas of embryogenesis, the creation of in silico models for entire mammalian embryogenesis—a long-sought goal in computational biology—seems attainable. Nevertheless, two pivotal gaps remain: (1) To capture the intricate and multi-faceted nature of embryogenesis, a cost-effective technology is requisite—one capable of profiling entire embryos at a single-cell level with high temporal resolution in 3D space. (2) To build the in silico model from the massive, high-dimensional datasets, we require powerful machine learning techniques adept at directly learning complex data-driven models and at making non-trivial predictions. In this proposal, I aim to construct the first-ever foundational in silico model of whole-embryo mouse embryogenesis. To begin, I will utilize Ultima's innovative and cost-efficient “mostly natural sequencing-by- synthesis” chemistry, combined with its ultra-high field of view wafer disc platform, to establish a large-scale 3D multi-omics cell atlas of mouse embryogenesis from E6.5 to E16.5. This will involve one-day intervals and incorporate a total of 50 million cells. The versatility of Ultima’s UG100 platform allows us to couple it with RNA metabolic labeling, CRISPR-Cas9 based lineage tracing, and multi-omics, thereby producing a comprehensive, high-definition, 3D cell atlas of mouse embryogenesis. Subsequently, I plan to devise sophisticated temporal modeling techniques for learning multi-scale, multi-modal RNA velocity vector fields. Focusing on the spatial aspect, I will devise a RNA signal-based segmentation technique for single-cell resolved spatial transcriptomics. Computer vision methods, such as the Gaussian process, will be utilized to align serial 2D slices to reconstruct the 3D embryos. To marry both temporal and spatial data dimensions, we will augment our RNA velocity vector field model to encompass data-driven PDE (partial differential equations) models. Preliminary findings suggest our model can accurately simulate the entire C. elegans embryogenesis starting from a single zygote, accounting for protein expression, cell migration, and cell fate dynamics. In parallel, to harness existing vast datasets, we'll integrate our PDE-like model with the Generative Pre-trained Transformer (as used in ChatGPT). This integration will equip our foundational model to seamlessly manage spatial, temporal, and multi-omics data. Prioritizing interpretability and predictability, we will leverage differential geometry analysis as done in my previous Dynamo framework. By merging cutting-edge technology with computational innovation, this project seeks to bridge critical...

Key facts

NIH application ID
10910636
Project number
1DP2HG014282-01
Recipient
STANFORD UNIVERSITY
Principal Investigator
Xiaojie Qiu
Activity code
DP2
Funding institute
NIH
Fiscal year
2024
Award amount
$1,312,485
Award type
1
Project period
2024-09-19 → 2027-08-31