# Biomedical Terminology Quality Assurance for Enhancing Clinical Queries over Electronic Health Records

> **NIH NIH R01** · UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON · 2020 · $331,145

## Abstract

PROJECT SUMMARY
We propose to develop an automatic change-suggestion (auto-suggestion) approach for quality enhancement
of biomedical terminologies. This approach can not only detect errors, but also suggest changes that lead to
the identification and fixes of the root causes of errors. Biomedical terminologies provide the basis for data
quality in data collection, annotation, management, analysis, sharing, and reuse. They not only serve as a part
of the metadata standards for describing data in the FAIR Data Principles (Findable, Accessible, Interoperable,
Reusable), but also play a vital role in downstream information systems as a declarative knowledge source.
Because of these and additional new roles biomedical terminologies may play, quality issues, if not addressed,
can affect the quality of all downstream information systems and tools (including electronic health record,
clinical decision support and patient safety evaluation systems). Most existing terminology quality assurance
approaches merely indicate the presence of possible quality issues but do not automatically provide
suggestion for fixes. The long-term goal of this study is to develop an approach for AutomatiC Error-
identification and change-Suggestion (ACES), moving domain expert and ontology engineer's effort to
validating suggested changes, rather than creating changes. To advance this goal, we propose three
specific aims: Aim 1. To develop an auto-suggestion reasoning framework for automatic error detection in non-
lattice subgraphs by performing Formal Concept Analysis (FCA) on logical definitions of concepts. The
constructed FCA-lattices will serve as logically meaningful reference structures for comparison with the original
non-lattice subgraphs to automatically reveal potential errors as well as suggest remedies. Aim 2. To develop
an automated method to uncover root causes of errors in logical definitions of concepts and suggest remedial
changes in the definitions for evaluation. We will develop a reasoning algorithm to automate the process of
locating erroneous or incomplete logical definitions that lead to the potential errors. Working with domain
experts, we will evaluate randomly selected auto-suggestions using our web-based system to assess the
effectiveness of our error detection and root-cause analysis methods. Aim 3. To quantitatively assess the
terminology quality impact on queries over healthcare data for patient cohort identification. We will leverage
SNOMED CT and a comprehensive EHR database Cerner Health Facts® to measure the global impact of
missing is-a relations and incorrect is-a relations on performing clinical queries over the EHR database
(missing is-a relations reduce recalls of queries, and incorrect is-a relations reduce the precisions of queries).
Our utilization of non-lattice subgraphs is based on a rigorous mathematical theory, which suggests that the
hierarchical relation between ontological concepts should structurally conform to the mathemat...

## Key facts

- **NIH application ID:** 9940031
- **Project number:** 1R01LM013335-01
- **Recipient organization:** UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
- **Principal Investigator:** Licong Cui
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $331,145
- **Award type:** 1
- **Project period:** 2020-08-01 → 2022-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9940031

## Citation

> US National Institutes of Health, RePORTER application 9940031, Biomedical Terminology Quality Assurance for Enhancing Clinical Queries over Electronic Health Records (1R01LM013335-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9940031. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
