BioGPS, BioThings and BioReel: illuminating dark data for biomedical research

NIH RePORTER · NIH · R01 · $483,750 · view on reporter.nih.gov ↗

Abstract

The overall goal of this project is to promote the accessibility and dissemination of biomedical information so that the research community can better leverage existing knowledge. Science is most efficient when hypotheses are based on the entirety of knowledge available to date. Unfortunately, up-to-date and comprehensive access to relevant knowledge is rarely achieved. This proposals put a particular emphasis on illuminating biomedical “dark data.” By analogy to the dark matter that is unaccounted for in the universe, dark data is defined by being unseen or underutilized by the scientific community. In this project, we will continuously strengthen our currently widely- used applications BioGPS and MyGene.info, and also develop two new applications: BioThings and BioReel. These applications, collectively, are targeted to make dark data resources Findable, Accessible, Interoperable, and Reusable (FAIR). BioGPS and BioReel are designed for non-computational scientists. BioGPS (http://biogps.org) is a gene portal for aggregating information on human genes and proteins. It illuminates dark data by creating a simple platform to discover and access gene-centric websites. BioGPS users can benefit each other by sharing the specific resources they discovered, and how they use or like them. BioReel will be developed as a tool to periodically monitor the relevant resources for researchers, and keep them notified when the knowledge about their genes of interest have been updated (e.g. new datasets available, annotated in a new pathway). MyGene.info and BioThings are designed for bioinformatics developers, who often face fragmented source data in terms of both the content and the heterogeneous formats. The significant amount of repetitive data-wrangling efforts has to be done by almost every bioinformaticians. We developed MyGene.info to integrate gene and protein annotation data into a simple and high performance web Application Programming Interface (API). It illuminates dark data on gene and protein annotations by pre-integrating over 200 annotation types in a standardized format. In this proposal, we will continue expand MyGene.info to include additional highly- requested annotations, both from a major data repository and smaller domain-specific data sources. In addition, we will generalize the infrastructure and the software pattern underlying the MyGene.info project, to make a generic API framework called the “BioThings SDK”. Two new APIs will be built using this framework, focusing on drugs/chemicals and diseases respectively, where the data fragmentation across resources are equally a problem.

Key facts

NIH application ID
10124400
Project number
5R01GM083924-13
Recipient
SCRIPPS RESEARCH INSTITUTE, THE
Principal Investigator
ANDREW I SU
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$483,750
Award type
5
Project period
2008-08-01 → 2023-03-31