Gene Ontology Consortium and Knowledgebase

NIH RePORTER · NIH · U24 · $2,330,267 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract Because of the staggering complexity of biological systems, biomedical research is becoming increasingly dependent on knowledge stored in a computable form. The Gene Ontology (GO) is by far the largest knowledgebase of how genes function, and has become a critical component of the computational infrastructure enabling the genomic revolution. The GO knowledgebase encodes a computational model of biological systems using modern semantic technologies, and this is the key to its broad adoption and application. It stores vastly more knowledge than one person can know, and therefore enables computational analyses that would otherwise be impossible. It has become indispensable in the interpretation of large-scale molecular measurements in biological research. Crucially for human health research, GO is also one of a suite of complementary ontologies constructed in such a way to maximally promote interoperability and comparability of data sets. It represents the gene functions and biological processes that can be perturbed in human disease, helping researchers or clinicians to identify genetic contributions to disease. GO is a knowledgebase that can be statistically mined, either standalone or in combination with data from other knowledge resources, which enables researchers to discover connections and form new hypotheses from the biological networks GO represents. All knowledge in GO is represented using semantic web technologies and so is amenable to computational integration and consistency checking. To ensure the knowledge environment meets the requirements of biomedical researchers, we will: 1) Develop and refine the Gene Ontology to reflect current biological knowledge; 2) Coordinate, integrate, and provide GO assertions from multiple sources; 3) Enhance usability of the GO resources for multiple research communities. We will extend the reach of our Consortium of contributors, to efficiently expand the content of the knowledgebase, and develop test sets and challenges to spur the development of machine learning methods for knowledge capture. Our aims reflect the essential requirements for realizing the overarching objectives for a biomedical knowledgebase: efficiently capturing and integrating biological knowledge and adhering to the highest possible standard for accuracy and detail; constructing and providing a robust, flexible, powerful, and extensible technological infrastructure available not only for internal use but just as easily by the wider community; and lastly, leveraging state-of-the-art social media, web services and other technologies to disseminate the GO resource to the entire biomedical research community.

Key facts

NIH application ID
10631046
Project number
5U24HG012212-02
Recipient
UNIVERSITY OF SOUTHERN CALIFORNIA
Principal Investigator
CHRISTOPHER J MUNGALL
Activity code
U24
Funding institute
NIH
Fiscal year
2023
Award amount
$2,330,267
Award type
5
Project period
2022-06-01 → 2027-03-31