# Incorporating molecular network knowledge into predictive data-driven models

> **NIH NIH F32** · MICHIGAN STATE UNIVERSITY · 2021 · $70,458

## Abstract

Modern computational techniques based on machine-learning (ML) and, more recently, deep-learning (DL) are
playing a critical role in realizing the precision medicine initiative. However, there is a critical need to
systematically combine these powerful data-driven techniques with prior molecular network knowledge to make
more accurate predictive models while also satisfactorily explaining their predictions in terms of mechanisms
underlying complex traits and diseases. I propose to use domain specific knowledge from biology and
computing to tackle three outstanding problems: 1) how to predict missing labels associated with millions of
publicly available samples? 2) what molecular/cellular function can be attached to these samples and 3) how
can we translate the findings from human data to a model species and back? ​Network-constrained Deep
Learning for Metadata Imputation: ​​Most multifactorial phenotypes are tissue dependent and manifest
differently depending on age, sex, and ethnicity. However, a majority of publicly-available genomic data lack
these labels. I will develop a network-guided approach to predict missing metadata of samples based on their
expression profiles by designing novel data-driven models where the model architecture and/or structure of the
input data are constrained by an underlying gene network. ​Network-guided Functional Analysis of Genomic
Data: ​​High-throughput experiments often generate lists of genes of interest that are hard to interpret.
Functional enrichment analysis (FEA) is a powerful tool that attaches functional meaning to an experimental
set of genes by summarizing them into sets of pathways/processes. However, standard FEA analysis is limited
by incomplete knowledge of gene function, lack of context of the underlying gene network, and noise in
expression data. I will address these limitations by developing a network-guided approach that jointly captures
genes, their interactions, and their known biological pathways/processes into a common, low-dimensional
space that facilitates deriving biological meaning by comparing the distance between the experimental gene
set and the pathway/process of interest. ​Joint Multi-Species Genomic Data Analysis and Knowledge
Transfer: ​​In particular, finding the optimal model system to use in a follow-up study based on genetic
signatures derived from human experiments is challenging because genetic networks can be quite different
from species to species. I propose to use data-driven models to embed heterogeneous networks comprised of
human genes and model species genes into a common, low-dimensional space to better compare genetic
signatures between two (or even multiple) species. I will apply these methods to three specific tasks, but I
emphasize that the results of this study will be transferable to any other biological problem where complex
gene/protein interactions are a major component. I have surrounded myself with a great support team and
developed a strong professional...

## Key facts

- **NIH application ID:** 10246414
- **Project number:** 5F32GM134595-03
- **Recipient organization:** MICHIGAN STATE UNIVERSITY
- **Principal Investigator:** Christopher Andrew Mancuso
- **Activity code:** F32 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $70,458
- **Award type:** 5
- **Project period:** 2019-09-01 → 2022-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10246414

## Citation

> US National Institutes of Health, RePORTER application 10246414, Incorporating molecular network knowledge into predictive data-driven models (5F32GM134595-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10246414. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*