# Natural audiovisual speech encoding in the early stages of the human cortical hierarchy

> **NIH NIH R01** · UNIVERSITY OF ROCHESTER · 2022 · $385,000

## Abstract

PROJECT SUMMARY
Speech is central to human life. Yet how the human brain processes speech in complex everyday situations
remains poorly understood. One prominent idea is that speech perception is carried out using brain areas and
mechanisms that are used for processing sounds more generally. And it has been suggested that these
mechanisms become specialized for speech through learning, resulting in a speech processing network in the
brain that processes increasingly complex aspects of the speech signal at successive hierarchical stages. But
questions about the function of this hierarchy remain. In particular, while it is commonly acknowledged that
seeing a speaker’s face in noisy environments can improve comprehension, our understanding of how visual
speech influences the hierarchical processing of speech remain unclear. This is unfortunate as speech
processing, and multisensory speech processing in particular, have been reported to be affected in a number of
clinical disorders, including autism and schizophrenia. Thus, as well as contributing to our understanding of this
most fundamental of human abilities, better knowledge of the neural mechanisms underpinning audiovisual
speech processing could have important clinical research implications. One of the principal reasons for our lack
of knowledge on the neurophysiology of audiovisual speech is the technical challenge associated with indexing
the neural processing of natural speech with high temporal resolution and at multiple levels of the speech
processing hierarchy. Non-human primates represent a less than perfect model for studying human speech
processing, the hemodynamic changes underlying functional magnetic resonance imaging are too slow to track
natural speech dynamics, and electrocorticography samples only a limited number of brain areas and cannot be
broadly applied in clinical research. Recently, our group has introduced several new approaches for indexing
natural speech processing using electroencephalography (EEG). These include entirely novel frameworks for
producing dependent measures of the hierachical encoding of natural speech, and for quantifying multisensory
integration of natural audiovisual speech. The present proposal seeks to exploit this opportunity to test the
hypothesis that the integration of audio and visual speech is a flexible, multistage process that adapts to optimize
comprehension based on the current listening conditions. Across three objectives the proposal aims to
characterize this flexibility by determining how the hierarchical processing stage at which visual and audio
speech are integrated varies as a function of 1) the listening environment, 2) the visual information available and
3) the deployment of attention. The work promises to bring a new depth of understanding to the perception of
one of humanity’s most essential signals. And it will introduce several novel analyses and experimental
paradigms that should be easily deployable in tackling research on ...

## Key facts

- **NIH application ID:** 10357771
- **Project number:** 5R01DC016297-05
- **Recipient organization:** UNIVERSITY OF ROCHESTER
- **Principal Investigator:** Edmund Lalor
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $385,000
- **Award type:** 5
- **Project period:** 2018-03-15 → 2024-02-29

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10357771

## Citation

> US National Institutes of Health, RePORTER application 10357771, Natural audiovisual speech encoding in the early stages of the human cortical hierarchy (5R01DC016297-05). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10357771. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
