Project Summary The proposed study seeks to identify genomic features that explain epidemiological co-occurrences of childhood cancers (CCs) and structural birth defects (SBDs). We will import germline and genomics data from affected cohorts into the Common Fund Data Ecosystem (CFDE) Data Distillery Knowledge Graph (DDKG) project, an ongoing CFDE project that generated a comprehensively annotated graph database built with empirical data from 11 Common Fund projects with over 40 million data points and 300 million relationships, and which utility has been proven through successful applications of several complex use cases. Our goals for this proposal are first to expand and update the DDKG schema to support a broader spectrum of genomic data types and edge (relationship) weighting by evidence level. This expansion will expand the DDKG’s information capacity and better support machine learning applications on extracted data. Datasets chosen from this project are based on epidemiological observations on the relationships between congenital heart defects and neuroblastoma or hematological malignancies, and brain or CNS congenital defects and brain tumors. Data from representative cohorts with any or both selected CCs and SBDs will be obtained from the Kids First project as germline and tumor data. We will also incorporate genomics data from the NCI Molecular Targets Project into the DDKG, representing a comprehensive repository of childhood cancer genomics data produced by the lead principal investigator. We will analyze the DDKG data for predicted relationships between SBDs and CCs with strategies including topological link prediction methods, the Connect the Dots algorithm, dimensionality reduction methods (such as embeddings) with cluster detection, and machine learning with PyG’s support for Graph Neural Networks (GNNs) for heterogeneous graphs. User data delivery will be accomplished with the DDKG project’s pre-built tools, and by developing and refining innovative data delivery methods. This will enhance the accessibility of the project's findings and extend the utility of the DDKG for the broader research community. With the analysis of large-scale pediatric cohort genomics data, we seek to set a precedent for large-scale genomics data analyses using Common Fund Data while providing significant insights into the genetic drivers of CCs and SBDs, paving the way for future research and clinical applications. Other researchers can utilize the DDKG with our methodology developments, increasing the opportunities to reuse CFDE data.