# Using Data Integration and Predictive Analytics to Improve Diagnosis-Based Performance Measures

> **NIH VA I01** · VETERANS AFFAIRS MED CTR SAN FRANCISCO · 2021 · —

## Abstract

Background: VA performance monitoring makes extensive use of diagnosis-based quality measures that track
delivery of care only among patients who have qualifying ICD-9 diagnosis codes. Diagnosis-based measures
can be calculated using existing VA data, allowing for low-cost, near real-time performance monitoring.
However, diagnosis-based measures can have critical validity problems if the targeted condition is under- or
over-diagnosed to differing degrees across facilities. When variation is diagnosing and coding occurs, facility
rankings on measured performance can be misleading: High performing facilities can score poorly, low
performing facilities can score well, and facilities with the same real performance can fall at opposite ends of
the facility rank distribution. Use of diagnosis-based process measures can therefore undermine one of the
primary purposes of quality measurement: The comparison of facilities and systems. In addition, diagnosis-
based measures cannot be used to detect gaps in access to care for patients who have a targeted condition
but no qualifying diagnosis code. Finally, when diagnosis rates vary across patient subgroups, diagnosis-based
measures cannot be used to detect and act on healthcare disparities. Problems with diagnosis-based
measures could be remedied if true prevalence data were available: Comparisons of performance based on
diagnosis- versus prevalence-based measures would detect facilities with anomalous diagnosis rates and
distinguish variation in true performance from variation in case-finding. However, for many conditions, the
electronic health record (EHR) does not contain data on true prevalence.
Objectives: The goal of the proposed project is to develop a general method for improving diagnosis-based
measures when valid prevalence data are not readily available. We propose to build a model for predicting
prevalence using multiple sources of existing data and to validate it through a one-time collection of gold
standard outcome data (survey-based SUD prevalence). Leveraging existing data with targeted collection of
model development and validation data is a cost-effective strategy to improve diagnosis-based measures
without requiring ongoing, expensive disease surveillance. Focusing on substance use disorder (SUD) care as
an example, the objectives of this study are to: (a) assess the degree of SUD under- or over-diagnosis by
comparing the proportion of patients with coded SUD diagnoses in the VA administrative data to SUD
prevalence estimates obtained using a validated measure in a patient survey conducted at 30 VA healthcare
systems; (b) refine and validate a model for predicting SUD prevalence among VA patients using multiple
existing data sources; and (c) assess disparities in SUD diagnosis by comparing diagnosis rates to survey-
based SUD prevalence estimates across patient age, sex, and racial/ethnic groups.
Methods: We will collect data on DSM-IV and DSM-5-concordant SUD among VA patients using a va...

## Key facts

- **NIH application ID:** 10457091
- **Project number:** 7I01HX002128-05
- **Recipient organization:** VETERANS AFFAIRS MED CTR SAN FRANCISCO
- **Principal Investigator:** Katherine JoAnn Hoggatt
- **Activity code:** I01 (R01, R21, SBIR, etc.)
- **Funding institute:** VA
- **Fiscal year:** 2021
- **Award amount:** —
- **Award type:** 7
- **Project period:** 2017-01-01 → 2021-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10457091

## Citation

> US National Institutes of Health, RePORTER application 10457091, Using Data Integration and Predictive Analytics to Improve Diagnosis-Based Performance Measures (7I01HX002128-05). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10457091. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*