# Characterizing the recovery of spectral, temporal, and phonemic speech information from visual cues

> **NIH NIH R01** · UNIVERSITY OF MICHIGAN AT ANN ARBOR · 2024 · $524,946

## Abstract

Project Summary
Auditory speech perception is essential for social, vocational, and emotional health in hearing individuals.
However, the reliability of auditory signals varies widely in everyday settings (e.g., at a crowded party), requiring
supplemental processes to enable accurate speech perception. The principle mechanisms that support the
perception of degraded auditory speech signals are auditory-visual (crossmodal) interactions, which can
perceptually restore speech content using visual cues provided by lipreading, rhythmic articulatory movements,
and the natural correlations present between oral resonance and mouth shape. Moreover, receptive speech
processes can be limited through a variety of causes, including intrinsic brain tumor, stroke, cochlear implant
usage, and age-related hearing loss, making compensatory crossmodal mechanisms necessary for one to
continue working and maintaining healthy social interactions. However, the physiological processes that enable
vision to facilitate speech perception remain poorly understood and no integrative model exists for how these
multiple visual dimensions combine to enhance auditory speech perception. In the auditory domain, distributed
populations of neurons encode spectro-temporal information about acoustic cues that are then transcoded into
phonemes. We propose a dual-route perceptual model through which visual signals integrate with phoneme-
coded neurons. First, a direct path through which viseme-to-phoneme conversions generate semi-overlapping
distributions of activity in the superior temporal gyrus, leading to improved hearing through improved auditory
phoneme tuning functions. Second, an indirect path through which visual features restore spectral information
about speech frequencies and alter phoneme-response timing, resulting in improved auditory spectro-temporal
profiles (which in turn are transcoded into phonemes with greater precision). Finally, we will examine the
hypothesis that our perceptual system optimizes which of these visual dimensions is prioritized for recovery
based on what is missing from the auditory signal. These studies will provide a unified framework for how speech
perception benefits from different visual signals. By understanding biological approaches to crossmodally
restoring degraded auditory speech information, we can develop better targeted rehabilitation programs and
neural prostheses to maximize speech perception recovery after trauma or during healthy aging.

## Key facts

- **NIH application ID:** 10786104
- **Project number:** 5R01DC020717-02
- **Recipient organization:** UNIVERSITY OF MICHIGAN AT ANN ARBOR
- **Principal Investigator:** David Brang
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $524,946
- **Award type:** 5
- **Project period:** 2023-02-14 → 2028-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10786104

## Citation

> US National Institutes of Health, RePORTER application 10786104, Characterizing the recovery of spectral, temporal, and phonemic speech information from visual cues (5R01DC020717-02). Retrieved via AI Analytics 2026-06-12 from https://api.ai-analytics.org/grant/nih/10786104. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
