PROJECT SUMMARY Congenital Heart Disease (CHD) is the most common birth defect, yet the genetics of this disease are poorly understood. The genomic mechanisms of this disease include distinct rare copy number variants (CNVs) and protein- coding single nucleotide variants (SNVs). CHDs without other congenital anomalies, or isolated CHD, comprise 75% of all CHDs. Genome sequencing (GS) studies of isolated CHD have focused primarily on protein-coding regions, identifying disease-causal variants in only ~10-20% of subjects. This substantial knowledge gap suggests that other etiologies, such as variation in the non-coding genome, may play a role. The non-coding genome is vast, constituting 98% of the genome, and encompasses multiple feature types, including the non-coding RNAs. There is growing evidence for the role of long non-coding RNAs (lncRNAs) in disease, including developmental disorders of the heart. As such, the long-term goal of this study is to elucidate lncRNA’s role in contributing to cardiac malformations. The overarching objective of the proposed investigation is to develop computational methods to predict the function of lncRNAs involved in heart development and predict the pathogenic impact of variants impacting these molecules leading to heart maldevelopment. We will use GS data from the Gabriella Miller Kids First (GMKF) cohort to associate variation in lncRNAs to CHD. We will then use single-cell RNA-sequencing (scRNA-Seq) data to identify lncRNAs expressed in relevant cell types during crucial stages of human cardiogenesis. Our central hypothesis is that variants in lncRNAs are a probable cause in unsolved CHD cases and that by using scRNA-seq data, we can prioritize candidates for future functional validation. We propose the following specific aims to address this challenge. In Aim 1, we will develop a machine learning (ML) tool to annotate lncRNA variants in our CHD cohort. There is a lack of tools to interpret the biological implications of CNVs and SNVs impacting lncRNAs. Our preliminary data effectively annotated clinically validated CNVs associated with isolated CHD by applying ML. We will extend our methods to consider CNVs and SNVs impacting lncRNAs and those impacting protein-coding genes. Aim 2 will apply network analysis on scRNA-Seq data to elucidate lncRNA’s role in heart development. We will associate lncRNA-protein causal relationships with general heart development by using inference from the gene regulatory networks (GRN). GRN will be built from single-cell transcriptomics data to contribute to the discovery of lncRNAs involved in heart development. This work is innovative as we will be the first to construct an ML tool for cardiac-specific lncRNA variant annotation and clarify the role that lncRNAs may play in the development of CHD. Completing this project will achieve the NHLBI’s mission of creating computational techniques for understanding the mechanisms underlying the regulation of normal heart formation and...