# U24-Uncovering the Shared Genetic Origins of Childhood Cancer and Structural Birth Defects Through Enhanced Data Integration and Analysis with the CFDE Data Distillery Knowledge Graph.

> **NIH NIH U24** · CHILDREN'S HOSP OF PHILADELPHIA · 2024 · $1,489,695

## Abstract

Project Summary
The proposed study seeks to identify genomic features that explain epidemiological co-occurrences of childhood
cancers (CCs) and structural birth defects (SBDs). We will import germline and genomics data from affected
cohorts into the Common Fund Data Ecosystem (CFDE) Data Distillery Knowledge Graph (DDKG) project, an
ongoing CFDE project that generated a comprehensively annotated graph database built with empirical data
from 11 Common Fund projects with over 40 million data points and 300 million relationships, and which utility
has been proven through successful applications of several complex use cases.
Our goals for this proposal are first to expand and update the DDKG schema to support a broader spectrum of
genomic data types and edge (relationship) weighting by evidence level. This expansion will expand the DDKG’s
information capacity and better support machine learning applications on extracted data. Datasets chosen from
this project are based on epidemiological observations on the relationships between congenital heart defects
and neuroblastoma or hematological malignancies, and brain or CNS congenital defects and brain tumors. Data
from representative cohorts with any or both selected CCs and SBDs will be obtained from the Kids First project
as germline and tumor data. We will also incorporate genomics data from the NCI Molecular Targets Project into
the DDKG, representing a comprehensive repository of childhood cancer genomics data produced by the lead
principal investigator.
We will analyze the DDKG data for predicted relationships between SBDs and CCs with strategies including
topological link prediction methods, the Connect the Dots algorithm, dimensionality reduction methods (such as
embeddings) with cluster detection, and machine learning with PyG’s support for Graph Neural Networks (GNNs)
for heterogeneous graphs. User data delivery will be accomplished with the DDKG project’s pre-built tools, and
by developing and refining innovative data delivery methods. This will enhance the accessibility of the project's
findings and extend the utility of the DDKG for the broader research community.
With the analysis of large-scale pediatric cohort genomics data, we seek to set a precedent for large-scale
genomics data analyses using Common Fund Data while providing significant insights into the genetic drivers of
CCs and SBDs, paving the way for future research and clinical applications. Other researchers can utilize the
DDKG with our methodology developments, increasing the opportunities to reuse CFDE data.

## Key facts

- **NIH application ID:** 10994331
- **Project number:** 1U24OD038422-01
- **Recipient organization:** CHILDREN'S HOSP OF PHILADELPHIA
- **Principal Investigator:** Sharon Diskin
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $1,489,695
- **Award type:** 1
- **Project period:** 2024-09-19 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10994331

## Citation

> US National Institutes of Health, RePORTER application 10994331, U24-Uncovering the Shared Genetic Origins of Childhood Cancer and Structural Birth Defects Through Enhanced Data Integration and Analysis with the CFDE Data Distillery Knowledge Graph. (1U24OD038422-01). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10994331. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
