# Algorithmic Classification of Paraphasias

> **NIH NIH R01** · OREGON HEALTH & SCIENCE UNIVERSITY · 2021 · $294,197

## Abstract

Project Summary
This application’s parent grant, R01DC015999, is focused on the development of automated
systems for identifying and categorizing paraphasic speech errors in language samples from
individuals with post-stroke aphasia, both in the context of confrontation naming tests as well as
in connected speech. Current approaches require that language samples be manually
transcribed, which is both time-consuming and error-prone, and limits the clinical applicability of
the technology. Since the parent grant was written, there have been major improvements in
automatic speech recognition (ASR) technology, and it may soon be possible to automate this
transcription step. This would open many new avenues for applying automated systems of the
sort developed under the parent grant, both in clinical and research settings. However, these
promising new ASR techniques depend on large and carefully-annotated datasets, of the sort
that do not exist currently for aphasic speech. Under this administrative supplement, we
propose to address this issue by performing an extensive campaign of transcription and detailed
annotation of an already-existing publicly-available library of audio recordings of aphasic
speech, including both structured naming tests and discourse samples. In addition to phonemic
transcription of utterances themselves, we will annotate other features of aphasic speech (false
starts, disfluencies, etc.) so as to support the development of automated algorithms for
analyzing such speech. Our interdisciplinary team of machine learning researchers and
aphasiologists will collaborate closely to produce a curated dataset of the sort needed to
develop, train, and evaluate modern machine learning techniques for speech recognition.
Importantly, the resulting dataset will be documented and organized in a similar manner to other
large-scale ASR datasets, and will be released publicly to both the clinical and machine learning
communities. In order to raise awareness of the dataset (and of this problem space in general)
within the machine learning community, we further propose to organize a shared evaluation
task, in which participating teams will make use of our final dataset to build automated
transcription systems for naming tests, which will be compared in a “bakeoff” setting.

## Key facts

- **NIH application ID:** 10411534
- **Project number:** 3R01DC015999-04S1
- **Recipient organization:** OREGON HEALTH & SCIENCE UNIVERSITY
- **Principal Investigator:** Steven Bedrick
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $294,197
- **Award type:** 3
- **Project period:** 2018-09-01 → 2023-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10411534

## Citation

> US National Institutes of Health, RePORTER application 10411534, Algorithmic Classification of Paraphasias (3R01DC015999-04S1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10411534. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*