# Genome-wide Inference of Human Gene Function from Model Organism Data

> **NIH NIH P01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2020 · $258,583

## Abstract

ABSTRACT 
Pathway analysis of genomic data—the use of prior knowledge about how genes function together in biological 
systems—plays an increasingly critical role in gaining biological insights from large-scale genomic studies, and 
particularly in cancer research. However, even the richest source of computer-accessible biological pathway 
information, the Gene Ontology (GO), is very incomplete, hampering pathway analyses. Over the past three 
years, the GO Consortium has developed a project that has shown that, by utilizing a rigorous phylogenetic 
approach, we can increase the amount of knowledge for human genes by five-fold through careful use of 
experimental data obtained in model organisms such as the mouse, fruit fly, and yeast. The GOC project, 
however, relies on expert human biologists, and will not scale to the entire human genome. Here, we propose 
to develop a computational approach that leverages the experience gained in the GOC project. We will 
develop an accurate, scalable computational solution to the gene function inference problem, which will 
dramatically increase the amount of biological information that can be used in analysis of genome-scale human 
datasets. In brief, the task is to integrate knowledge obtained from experiments across multiple organisms, in 
the context of the family tree that relates the genes, by constructing a probabilistic model of function 
conservation and divergence. The main application of the probabilistic model will be to infer the function of 
human genes, from experiments in other organisms. While each gene family will have a specific model 
depending on its own, unique history, to avoid overfitting we will estimate only a small number of parameters 
that are shared across all families. We propose to use the same, rigorous model of functional evolution as 
employed in the GOC project, which is based on evolutionary gain and loss of different kinds of functions (e.g. 
a catalytic function, binding function or even participation in a biological process or pathway), using not only 
GO annotations but additional information such as protein domain structure and active sites. We will use the 
manually-curated examples from the GO Consortium as a training set for developing, as well as a test set for 
assessing, our computational inference method. We expect that this work will result in a dramatic increase in 
the number of GO annotations for human genes, resulting in much more informative results from pathway 
analysis, thus generating additional insights into human disease risk, progression and potential therapies. 
While our approach is general, we will focus manual validation on cancer-related pathways in order to ensure 
applicability specifically in cancer research.

## Key facts

- **NIH application ID:** 9991773
- **Project number:** 5P01CA196569-05
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Paul D. Thomas
- **Activity code:** P01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $258,583
- **Award type:** 5
- **Project period:** — → —

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9991773

## Citation

> US National Institutes of Health, RePORTER application 9991773, Genome-wide Inference of Human Gene Function from Model Organism Data (5P01CA196569-05). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/9991773. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
