Protege: A Knowledge-Engineering Environment for Advancing Biomedical Sciences

NIH RePORTER · NIH · R01 · $559,088 · view on reporter.nih.gov ↗

Abstract

Project Abstract The engineering of ontologies that define the entities in an application area and the relationships among them has become essential for modern work in biomedicine. Ontologies help both humans and computers to manage burgeoning numbers of data. The need to annotate, retrieve, and integrate high- throughput data sets, to process natural language, and to build systems for decision support has set many communities of investigators to work building large ontologies. The Protégé system has become an indispensable open-source resource for an enormous international community of scientists—supporting the development, maintenance, and use of ontologies and electronic knowledge bases by biomedical investigators everywhere. The number of registered Protégé users has grown from 3,500 in 2002 to more than 300,000 users as of this writing. The widespread use of ontologies in biomedicine and the availability of tools, such as Protégé, have taken the biomedical field forward to a new set of challenges that current technology has not been designed to address: Biomedical ontologies have grown in size and scope, and their creation, maintenance and quality assurance have become particularly effort-intensive and error-prone. In this proposal, we will develop new methods and tools that will significantly aid biomedical researchers in easily creating and testing biomedical ontologies throughout their lifecycle. Our plan entails four specific aims. First, we will develop methods and tools to allow biomedical scientist to easily create ontologies directly from their source documents, such as spreadsheets, tab indented hierarchies, and document outlines. Second, we will provide the methods and tools to allow biomedical scientist to identify potential “hot spots” in their ontologies that might affect their quality. Third, we will implement a comprehensive, automated testing framework for ontologies that will assist biomedical researchers in performing ontology and data quality assurance throughout the development cycle. Fourth, we will continue to expand and support the thriving Protégé user community, as it grows to include new clinicians and biomedical scientists as they build the ontologies needed to support clinical care, data-driven research, and the elucidation of new discoveries.

Key facts

NIH application ID
9848600
Project number
5R01GM121724-04
Recipient
STANFORD UNIVERSITY
Principal Investigator
Mark A Musen
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$559,088
Award type
5
Project period
2017-01-01 → 2021-12-31