Knowledge-Based Biomedical Data Science

NIH RePORTER · NIH · R01 · $506,502 · view on reporter.nih.gov ↗

Abstract

Knowledge-based biomedical data science In the previous funding period, we designed and constructed breakthrough methods for creating a semantically coherent and logically consistent knowledge-base by automatically transforming and integrating many biomedical databases, and by directly extracting information from the literature. Building on decades of work in biomedical ontology development, and exploiting the architectures supporting the Semantic Web, we have demonstrated methods that allow effective querying spanning any combination of data sources in purely biological terms, without the queries having to reflect anything about the structure or distribution of information among any of the sources. These methods are also capable of representing apparently conflicting information in a logically consistent manner, and tracking the provenance of all assertions in the knowledge-base. Perhaps the most important feature of these methods is that they scale to potentially include nearly all knowledge of molecular biology. We now hypothesize that using these technologies we can build knowledge-bases with broad enough coverage to overcome the “brittleness” problems that stymied previous approaches to symbolic artificial intelligence, and then create novel computational methods which leverage that knowledge to provide critical new tools for the interpretation and analysis of biomedical data. To test this hypothesis, we propose to address the following specific aims: 1. Identify representative and significant analytical needs in knowledge-based data science, and refine and extend our knowledge-base to address those needs in three distinct domains: clinical pharmacology, cardiovascular disease and rare genetic disease. 2. Develop novel and implement existing symbolic, statistical, network-based, machine learning and hybrid approaches to goal-driven inference from very large knowledge-bases. Create a goal- directed framework for selecting and combining these inference methods to address particular analytical problems. 3. Overcome barriers to broad external adoption of developed methods by analyzing their computational complexity, optimizing performance of knowledge-based querying and inference, developing simplified, biology-focused query languages, lightweight packaging of knowledge resources and systems, and addressing issues of licensing and data redistribution.

Key facts

NIH application ID
10197219
Project number
5R01LM008111-16
Recipient
UNIVERSITY OF COLORADO DENVER
Principal Investigator
LAWRENCE E HUNTER
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$506,502
Award type
5
Project period
2004-09-30 → 2023-06-30