Natural audiovisual speech encoding in the early stages of the human cortical hierarchy

NIH RePORTER · NIH · R01 · $385,000 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Speech is central to human life. Yet how the human brain processes speech in complex everyday situations remains poorly understood. One prominent idea is that speech perception is carried out using brain areas and mechanisms that are used for processing sounds more generally. And it has been suggested that these mechanisms become specialized for speech through learning, resulting in a speech processing network in the brain that processes increasingly complex aspects of the speech signal at successive hierarchical stages. But questions about the function of this hierarchy remain. In particular, while it is commonly acknowledged that seeing a speaker’s face in noisy environments can improve comprehension, our understanding of how visual speech influences the hierarchical processing of speech remain unclear. This is unfortunate as speech processing, and multisensory speech processing in particular, have been reported to be affected in a number of clinical disorders, including autism and schizophrenia. Thus, as well as contributing to our understanding of this most fundamental of human abilities, better knowledge of the neural mechanisms underpinning audiovisual speech processing could have important clinical research implications. One of the principal reasons for our lack of knowledge on the neurophysiology of audiovisual speech is the technical challenge associated with indexing the neural processing of natural speech with high temporal resolution and at multiple levels of the speech processing hierarchy. Non-human primates represent a less than perfect model for studying human speech processing, the hemodynamic changes underlying functional magnetic resonance imaging are too slow to track natural speech dynamics, and electrocorticography samples only a limited number of brain areas and cannot be broadly applied in clinical research. Recently, our group has introduced several new approaches for indexing natural speech processing using electroencephalography (EEG). These include entirely novel frameworks for producing dependent measures of the hierachical encoding of natural speech, and for quantifying multisensory integration of natural audiovisual speech. The present proposal seeks to exploit this opportunity to test the hypothesis that the integration of audio and visual speech is a flexible, multistage process that adapts to optimize comprehension based on the current listening conditions. Across three objectives the proposal aims to characterize this flexibility by determining how the hierarchical processing stage at which visual and audio speech are integrated varies as a function of 1) the listening environment, 2) the visual information available and 3) the deployment of attention. The work promises to bring a new depth of understanding to the perception of one of humanity’s most essential signals. And it will introduce several novel analyses and experimental paradigms that should be easily deployable in tackling research on ...

Key facts

NIH application ID: 10357771
Project number: 5R01DC016297-05
Recipient: UNIVERSITY OF ROCHESTER
Principal Investigator: Edmund Lalor
Activity code: R01
Funding institute: NIH
Fiscal year: 2022
Award amount: $385,000
Award type: 5
Project period: 2018-03-15 → 2024-02-29