# Predicting Phenotype by Deep Learning Heterogeneous Multi-Omics Data

> **NIH NIH R01** · UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON · 2024 · $318,649

## Abstract

Project Summary
Complex disease and traits are caused by dynamic genetic regulation and environmental interactions.
Numerous genetic, genomic, and phenotypic datasets have been generated, including genotypes, gene
expression, epigenetic changes, and electronic medical records (EMRs). Currently, there is main challenge on
development of novel informatic approaches to effectively link phenotype with genomic information.
Specifically, genome-wide association studies (GWAS) have reported several thousand single nucleotide
polymorphisms (SNPs) that are significantly associated with the disease and traits; however, more than 80% of
them are noncoding variants, making it difficult to interpret their potential disease-causal roles. We and others
have systematically examined how phenotypic variability in disease risk for a broad spectrum of disease
phenotypes can be explained by regulatory variants. Now, we hypothesize that such regulation will be in a
tissue-specific, cell type-specific and developmental stage-specific (TCD-specific) manner. Importantly, large
genomic consortia, like ENCODE, FANTOM5, the Roadmap Epigenomics, and GTEx have continuously
generated high-quality functional data for annotating genome-wide variants. The emerging single-cell
sequencing technologies have enabled us to examine how genetic variants affect cellular functions within
individual cells or specific cell types. This brings us an unprecedented opportunity to develop novel statistical
and computational approaches for deep understanding of the genetic architecture of phenotype. In this
proposal, we combine bioinformatics, single cell omics, deep learning, and phenotype and EMR data mining to
develop novel analytical strategies that maximally leverage information from both genotype and expression
from massive heterogeneous data, aiming to predict phenotype by functional assessment of DNA variation at
the TCD-specific levels. We propose the following three specific aims. (1) To develop a deep learning method
for variant impact predictor, DeepVIP, that maximally utilizes functional and regulatory data to predict the
causal roles of variants in complex disease and traits. (2) To develop phenotype-specific network approaches
to resolve genotype-phenotype relationships in the spatiotemporal manner and single-cell resolution. We will
develop a novel method, single cell dense module search of GWAS signals (scGWAS) and also a graphical
neural network approach, GNN-scTP, to detect driving roles of genes from single cell RNA-seq data. These
methods can effectively identify critical regulatory modules and genes in complex disease in the TCD-specific
manner. (3) To apply the methods to 16 neurodevelopmental and neurodegenerative disorders and related
traits, as well as broad phenotypes using Vanderbilt biobank (BioVU) and UK Biobank data – both have
genotypes linked with rich phenotypic information. Our proposal is timely and innovative to study the genetic
architecture in human complex ...

## Key facts

- **NIH application ID:** 10847443
- **Project number:** 5R01LM012806-08
- **Recipient organization:** UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
- **Principal Investigator:** Zhongming Zhao
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $318,649
- **Award type:** 5
- **Project period:** 2017-09-14 → 2025-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10847443

## Citation

> US National Institutes of Health, RePORTER application 10847443, Predicting Phenotype by Deep Learning Heterogeneous Multi-Omics Data (5R01LM012806-08). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10847443. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
