# Leveraging Common Fund data for feature selection in Kids First studies.

> **NIH NIH R03** · CHILDREN'S HOSP OF PHILADELPHIA · 2020 · $363,306

## Abstract

Project Abstract
This project will pilot a process for identifying multi-variant interactions contributing structural birth defect
and childhood cancer disorders. This study will focus on analysis of oral-facial clefts, congenital diaphragmatic
hernia, and congenital heart defects from whole-genome sequencing (WGS) data from family cohorts taken
from Gabriella Miller Kids First Pediatric Research Project (KF).
 Typically, tests to link disorders to genome-wide complex multivariate associations are computationally
prohibitive. Thus, a first step to making such analyses more reasonable is to limit the number of variables
(genes, variants) being tested together. We can reduce the number of possible tests by restricting what data
should be tested. This study will seek to reduce the data for testing by utilizing biological knowledge from other
two Common Fund datasets: the Knockout Mouse Phenotyping Program (KOMP2), and the Genotype-Tissue
Expression (GTEx) project. KOMP2 has generated extensive information on mouse knockout developmental
phenotypes relevant for matching gene and phenotypes in KF WGS studies. GTEx can be merged with KF loci
and relevant tissue-to-phenotype relationships. Thus, using features from other Common Fund data and
annotations, we can generate selected subsets of KF variants and genes as feature-reduced KF data.
 A comprehensive machine learning (ML) analysis pipeline will then be utilized for the identification of
candidate risk factors and characterization of complex patterns of association between these feature-reduced
KF data. In addition to performing the more traditional univariate association analyses of genotype vs.
phenotype, this pipeline will also identify complex associations including (1) context-dependent genetic effects
resulting from non-additive multi-variant interactions, i.e. epistasis, and (2) subgroup-specific associations, i.e.
by phenotype and genotypic heterogeneity, where different etiological paths lead to the same/similar
phenotypes in the selected KF subject group. This pipeline will include feature selection, modeling, and
interpretation of multi-variant interactions.
 The outcomes of this study will include (1) pipelines for integrating Common Fund data into Kids First
datasets, (2) integrated KF-KOMP2-GTEx datasets including cross-species integration, (3) ML pipelines for
multi-variant interaction analyses of phenotype vs genotype in selected, reduced-feature KF data, and (4) results
from the aforementioned pipelines for multi-variant interactions for later hypothesis testing.

## Key facts

- **NIH application ID:** 10112014
- **Project number:** 1R03OD030600-01
- **Recipient organization:** CHILDREN'S HOSP OF PHILADELPHIA
- **Principal Investigator:** Deanne Marie Taylor
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $363,306
- **Award type:** 1
- **Project period:** 2020-09-18 → 2023-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10112014

## Citation

> US National Institutes of Health, RePORTER application 10112014, Leveraging Common Fund data for feature selection in Kids First studies. (1R03OD030600-01). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10112014. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
