Enhanced ontology engineering through a Web-based, Cloud-based software architecture

NIH RePORTER · NIH · R01 · $236,100 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT This proposal is submitted to supplement grant R01 LM013498-01, “The Metadata Powerwash - Integrated tools to make biomedical data FAIR.” The parent grant proposes to study methods to standardize the metadata in online datasets to make the corresponding data more findable, accessible, interoperable, and reusable. The goal is to transform the metadata that annotate experimental datasets online to a form that adheres to standard reporting guidelines and that uses terms from standard ontologies. The parent grant supports research that depends on the availability of a wide range of biomedical ontologies—standard collections of terms that describe the entities in different application areas. In our work, nearly all the ontologies that we use are created and maintained via a software tool known as Protégé, a system developed in our laboratory that is the most widely used open-source tool for ontology engineering in the world. These ontologies represent the development work of third parties or of members of the team participating in the research supported by the parent grant. Protégé currently exists in two forms: (1) a desktop version written as a Java application, and (2) a version that operates over the Web. We propose substantial, new software engineering to improve the performance and long-term sustainability of the Protégé system. To support our needs for ontology engineering in conjunction with our existing NLM R01 grant, we propose two specific aims: (1) We will convert WebProtégé to a modern, microservice-based architecture, adding new microservices—including the availability of a plug-in architecture that will allow third parties to contribute novel additions to the WebProtégé code base. We will use a software-development approach that will allow us to implement the new architecture in a controlled, incremental manner. (2) We will modernize WebProtégé to make it Cloud-native. We will take advantage of the NIH STRIDES initiative, containerizing the system for deployment in the Google Cloud Platform (GPC) and adapting the software to operate with Cloud-based third- party software for data storage, data queueing, and search. We also will migrate all current WebProtégé users and their projects to the Cloud-based system. Our work will benefit the biomedical community at large, while enhancing our capabilities for ontology engineering as required by our existing grant.

Key facts

NIH application ID
10405968
Project number
3R01LM013498-01S1
Recipient
STANFORD UNIVERSITY
Principal Investigator
Mark A Musen
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$236,100
Award type
3
Project period
2021-05-01 → 2025-01-31