# Overcoming explainability and data availability barriers to broad application of ECG ML screening with a system-wide ECG dataset

> **NIH NIH R21** · UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH · 2024 · $115,500

## Abstract

Abstract
Cardiovascular disease remains a major source of mortality in the US, and detection and prevention remain
lacking. Machine learning (ML) based detection of cardiovascular disease from raw 12-lead electrocardiogram
(ECG) signals is a digital health technique (ML-ECG) that has shown remarkable potential to improve patient
care. However, the best-developed ML tools require large data sets for training, and lack explainability. A major
long-term goal is to develop ML-ECG approaches that can be equitably deployed in clinical care to help detect
rare but serious diseases across broad populations. The current objectives towards this long-term goal are to (1)
implement self-supervised ML methods to train ML-ECG algorithms with smaller data-sets and (2) develop novel
ML-interpretability methods for ECG-classification algorithms. The central hypothesis is that application of unique
ML-ECG approaches to this rich existing dataset can reduce the amount of data needed for training and devel-
opment of clinically-actionable algorithms, and can provide valuable and as yet unrecognized insight into physio-
logic links used in ML-EC algorithms. The rationale for this project is that: (1) contrastive self-supervised training
allows model development on smaller labeled datasets without significantly compromising accuracy; (2) explain-
ability analyses through generative adversarial learning can fill gaps in understanding of how these algorithms
are adding to clinical knowledge; and (3) an existing dataset of over 1.4 million clinical ECGs provides an ideal
environment in which to apply these techniques. The central hypothesis will be tested by pursuing 2 specific aims:
(1) Develop contrastive self-supervised training approaches to ECG-classification ML architectures, in order to
reduce the necessary power for highly-accurate predictions; and (2) Apply ML explainability analyses to ECG
classification architectures to understand the features driving predictions. Success in this project will advance the
ML-ECG science to allow its deployment not only in the detection of extremely common disease patterns, but the
more difficult and consequential tasks of identifying rare but serious abnormalities. Moreover, the results will also
provide insight into how such algorithms work.

## Key facts

- **NIH application ID:** 10809212
- **Project number:** 1R21HL172288-01
- **Recipient organization:** UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH
- **Principal Investigator:** BENJAMIN ADAM STEINBERG
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $115,500
- **Award type:** 1
- **Project period:** 2023-12-20 → 2025-11-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10809212

## Citation

> US National Institutes of Health, RePORTER application 10809212, Overcoming explainability and data availability barriers to broad application of ECG ML screening with a system-wide ECG dataset (1R21HL172288-01). Retrieved via AI Analytics 2026-06-01 from https://api.ai-analytics.org/grant/nih/10809212. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*