# FullMouth: Enhancing Dental Clinical Data and Reducing Disparities through Innovative ML Approaches.

> **NIH NIH R56** · UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON · 2024 · $722,040

## Abstract

Project Abstract/Summary
The vast amount of health data created in the United States may hold the key to understanding disease,
improving quality, and lowering healthcare costs. Electronic health records (EHRs), digital collections of patient
healthcare events and observations, are now ubiquitous in medicine and critical to healthcare delivery,
operations, and research. EHR data is often classified as structured or unstructured. Structured EHR data
include standardized diagnoses, medications, and laboratory values in fixed numerical or categorical fields. For
structured data, challenges such as missing, incomplete, and inconsistent data are very prevalent.
Unstructured data, in contrast, refer to free-form text written by healthcare providers, such as clinical notes and
discharge summaries. Dental care providers often write detailed findings, diagnoses, treatment plans and
prognostic factors in free-text format for clinical care purposes. While this information is easily accessible during
patient care, extracting it for generating meaningful insights for secondary analysis can be challenging. Utilizing
these records requires manual review by domain experts, which can be time-consuming and costly, particularly
when dealing with a large number of patient records. Unstructured data represents about 60% of total EHR data.
Recently, Large Language Models (LLMs) and newer deep learning approaches to Natural Language Processing
(NLP) have made considerable advances, outperforming traditional statistical and rule-based systems on a
variety of tasks.
To fully realize the promise of health information technology in dentistry, it is important to address data
missingness and disparity in missingness. Through a periodontal use-case, this proposal will tackle the challenge
of missing structured, and ‘technically’ inaccessible, unstructured clinical data. Periodontal (advanced gum
disease) problems are very pervasive, and unlike caries (whose prevalence has steadily declined over the past
four decades), disease burden and tooth loss secondary to periodontal disease remain intractable. In preliminary
work at two dental institutions, we observed that most patients seen for a comprehensive oral evaluation had
missing or incomplete documentation with respect to clinical periodontal indices/diagnosis, demographic, and
health-related behavior information – all of which are critical in diagnosing and treating periodontal disease. This
significantly limits our ability to learn and improve. Aim 1 will focus on using LLM-based NLP approaches for the
conversion of unstructured note entries into structured and machine-readable information. In Aim 2, we will use
imputation techniques to fill in missing structured clinical data entries. Aim 3 will then evaluate the impact of
reduction in clinical data missingness for both clinical and research applications. This work builds on our prior
work in developing the BigMouth Dental Data Repository (which contains regularly updated ...

## Key facts

- **NIH application ID:** 11137246
- **Project number:** 1R56DE034086-01
- **Recipient organization:** UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
- **Principal Investigator:** Oluwabunmi Tokede
- **Activity code:** R56 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $722,040
- **Award type:** 1
- **Project period:** 2024-09-01 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/11137246

## Citation

> US National Institutes of Health, RePORTER application 11137246, FullMouth: Enhancing Dental Clinical Data and Reducing Disparities through Innovative ML Approaches. (1R56DE034086-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/11137246. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
