Biomedical Terminology Quality Assurance for Enhancing Clinical Queries over Electronic Health Records

NIH RePORTER · NIH · R01 · $331,145 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY We propose to develop an automatic change-suggestion (auto-suggestion) approach for quality enhancement of biomedical terminologies. This approach can not only detect errors, but also suggest changes that lead to the identification and fixes of the root causes of errors. Biomedical terminologies provide the basis for data quality in data collection, annotation, management, analysis, sharing, and reuse. They not only serve as a part of the metadata standards for describing data in the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable), but also play a vital role in downstream information systems as a declarative knowledge source. Because of these and additional new roles biomedical terminologies may play, quality issues, if not addressed, can affect the quality of all downstream information systems and tools (including electronic health record, clinical decision support and patient safety evaluation systems). Most existing terminology quality assurance approaches merely indicate the presence of possible quality issues but do not automatically provide suggestion for fixes. The long-term goal of this study is to develop an approach for AutomatiC Error- identification and change-Suggestion (ACES), moving domain expert and ontology engineer's effort to validating suggested changes, rather than creating changes. To advance this goal, we propose three specific aims: Aim 1. To develop an auto-suggestion reasoning framework for automatic error detection in non- lattice subgraphs by performing Formal Concept Analysis (FCA) on logical definitions of concepts. The constructed FCA-lattices will serve as logically meaningful reference structures for comparison with the original non-lattice subgraphs to automatically reveal potential errors as well as suggest remedies. Aim 2. To develop an automated method to uncover root causes of errors in logical definitions of concepts and suggest remedial changes in the definitions for evaluation. We will develop a reasoning algorithm to automate the process of locating erroneous or incomplete logical definitions that lead to the potential errors. Working with domain experts, we will evaluate randomly selected auto-suggestions using our web-based system to assess the effectiveness of our error detection and root-cause analysis methods. Aim 3. To quantitatively assess the terminology quality impact on queries over healthcare data for patient cohort identification. We will leverage SNOMED CT and a comprehensive EHR database Cerner Health Facts® to measure the global impact of missing is-a relations and incorrect is-a relations on performing clinical queries over the EHR database (missing is-a relations reduce recalls of queries, and incorrect is-a relations reduce the precisions of queries). Our utilization of non-lattice subgraphs is based on a rigorous mathematical theory, which suggests that the hierarchical relation between ontological concepts should structurally conform to the mathemat...

Key facts

NIH application ID: 9940031
Project number: 1R01LM013335-01
Recipient: UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
Principal Investigator: Licong Cui
Activity code: R01
Funding institute: NIH
Fiscal year: 2020
Award amount: $331,145
Award type: 1
Project period: 2020-08-01 → 2022-07-31