Project Summary A deep understanding of gene regulation and function during craniofacial development is not only important for our biological knowledge, but also critical to identify causal variants and genes underlying many dental, oral, and craniofacial (DOC) diseases. Numerous -omics datasets at the genomic, epigenomic, (single-cell) transcriptomic levels have been generated for craniofacial development and DOC diseases. These datasets are highly heterogeneous (e.g. platforms, species, tissues, developmental stages) and cross-species (e.g. human and mouse), requiring novel analytical approaches for decoding genetic regulation, molecular function, and cellular maps in craniofacial development. Critically, because of practical unavailability of human embryonic craniofacial tissue, there is a big gap between the abundant -omics and functional studies in murine craniofacial development and large-scale human genetic studies of DOC diseases. In this proposal, we combine machine learning, genomics, single-cell RNA sequencing (scRNA-seq), complex disease genetics, developmental biology to design novel methods aiming to decode complex genetic regulation and cellular maps during craniofacial development. We propose three specific aims. Aim 1. To develop a deep learning method, DeepFace, for characterizing and prioritizing genetic variants and regulation during craniofacial development. DeepFace is designed to decipher functional impact of noncoding variants and will be the first deep learning method to integrate cross-species functional features in craniofacial development. We will validate DeepFace by using data from genome-wide association studies (15 datasets) and case-parent trio-based whole genome sequencing (3 datasets) of orofacial clefts (OFCs). This validation will identify potential causal variants, both common and de novo mutations, in OFCs. Aim 2. To develop deep learning methods for time-series scRNA-seq data analysis in craniofacial development. We will develop novel algorithms including TTNNet for integrating time-series scRNA- seq data and DrivAER for tracing developmental trajectories and identifying driving transcription factors in craniofacial development. We will validate the methods using scRNA-seq datasets from the FaceBase consortium and to-be-generated data for mouse palate formation. Aim 3. To experimentally validate and characterize the top ranked novel mutations (Aim 1) and regulators (Aim 2). Building on our previous studies, strong preliminary data and highly experienced team, this proposal is timely to develop machine learning methods to effectively address the current gap between the genomics studies in murine craniofacial development and human genetic studies of orofacial clefts. The successful completion will provide 1) the NIDCR research community a suite of novel methods and analytical tools for genomic/epigenomic/scRNA-seq data, and 2) the mechanistic assessment on the mutations/genes and transcriptional regulators th...