Leveraging Common Fund data for feature selection in Kids First studies.

NIH RePORTER · NIH · R03 · $363,306 · view on reporter.nih.gov ↗

Abstract

Project Abstract This project will pilot a process for identifying multi-variant interactions contributing structural birth defect and childhood cancer disorders. This study will focus on analysis of oral-facial clefts, congenital diaphragmatic hernia, and congenital heart defects from whole-genome sequencing (WGS) data from family cohorts taken from Gabriella Miller Kids First Pediatric Research Project (KF). Typically, tests to link disorders to genome-wide complex multivariate associations are computationally prohibitive. Thus, a first step to making such analyses more reasonable is to limit the number of variables (genes, variants) being tested together. We can reduce the number of possible tests by restricting what data should be tested. This study will seek to reduce the data for testing by utilizing biological knowledge from other two Common Fund datasets: the Knockout Mouse Phenotyping Program (KOMP2), and the Genotype-Tissue Expression (GTEx) project. KOMP2 has generated extensive information on mouse knockout developmental phenotypes relevant for matching gene and phenotypes in KF WGS studies. GTEx can be merged with KF loci and relevant tissue-to-phenotype relationships. Thus, using features from other Common Fund data and annotations, we can generate selected subsets of KF variants and genes as feature-reduced KF data. A comprehensive machine learning (ML) analysis pipeline will then be utilized for the identification of candidate risk factors and characterization of complex patterns of association between these feature-reduced KF data. In addition to performing the more traditional univariate association analyses of genotype vs. phenotype, this pipeline will also identify complex associations including (1) context-dependent genetic effects resulting from non-additive multi-variant interactions, i.e. epistasis, and (2) subgroup-specific associations, i.e. by phenotype and genotypic heterogeneity, where different etiological paths lead to the same/similar phenotypes in the selected KF subject group. This pipeline will include feature selection, modeling, and interpretation of multi-variant interactions. The outcomes of this study will include (1) pipelines for integrating Common Fund data into Kids First datasets, (2) integrated KF-KOMP2-GTEx datasets including cross-species integration, (3) ML pipelines for multi-variant interaction analyses of phenotype vs genotype in selected, reduced-feature KF data, and (4) results from the aforementioned pipelines for multi-variant interactions for later hypothesis testing.

Key facts

NIH application ID: 10112014
Project number: 1R03OD030600-01
Recipient: CHILDREN'S HOSP OF PHILADELPHIA
Principal Investigator: Deanne Marie Taylor
Activity code: R03
Funding institute: NIH
Fiscal year: 2020
Award amount: $363,306
Award type: 1
Project period: 2020-09-18 → 2023-08-31