# Cortical tracking of speech-specific temporal structure in familiar vs. foreign speech

> **NIH NIH R21** · DUKE UNIVERSITY · 2020 · $159,000

## Abstract

Abstract
Speech carries rich linguistic information over a large range of temporal scales: the average durations of
phonemes, syllables, words and sentences range from tens of milliseconds to multiple seconds, respectively.
Thus, to achieve successful speech perception, the acoustic speech signal needs to be analyzed over
appropriate temporal scales to interface with their respective linguistic representations. Where and how this
acousto-linguistic mapping of temporal speech properties occurs is still not fully explained in current
speech/language models. Here, we show how cortical processing of acoustic temporal structure in speech is
modulated by higher-level linguistic analysis.
This requires two essential features: (1) control over the temporal scale at which analysis occurs; (2) control
over the linguistic content of the information. For (1), we use a novel sound-quilting algorithm that controls the
temporal structure in speech at different temporal scales by shuffling and then stitching together speech
segments of a certain length; this approach yields new ‘speech quilt’ signals that preserve the natural temporal
structure in the original source signal only up to the set segment length, but not beyond. The segment lengths
(30, 120, 480, and 960 ms) are chosen to span the typical temporal range of phonemes, syllables, and words.
For (2), we manipulate speech familiarity by using recordings of bi-lingual speakers, reading from a book in
English and Korean, as the source signal to create speech quilts in two languages. This approach ensures that
any changes at the signal acoustics level affect both languages identically, while manipulating the linguistic
percept differently. Thus, neural responses that vary as a function of segment length but are shared or similar
across the two languages will suggest analysis at the signal-acoustics level, whereas neural responses that
differ based on language familiarity will imply the presence of linguistic processes.
In Aim 1, we argue (using fMRI) that temporal acoustic structure in speech is extracted in superior temporal
sulcus (STS) for both languages; linguistic processes, originating in inferior frontal gyrus (IFG), become
engaged in a familiar language only and in turn modulate such signal-acoustics level analyses in anterior and
posterior STS. In Aim 2, we capitalize on the high temporal resolution of EEG to suggest that one potential
neural mechanism for the results in Aim 1 is that neurons are able to phase-lock more to the speech quilt
signal as its natural temporal structure increases (longer segment lengths), which in turn is again modulated
and enhanced by speech familiarity.
The results will have a significant impact on speech/language models that need to account for where and how
specific temporal scales in speech interface with their linguistic representations, while also informing
approaches towards clinical populations such as children struggling to decode critical temporal speech units,...

## Key facts

- **NIH application ID:** 9880421
- **Project number:** 5R21DC016386-03
- **Recipient organization:** DUKE UNIVERSITY
- **Principal Investigator:** Jan Tobias Overath
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $159,000
- **Award type:** 5
- **Project period:** 2018-03-05 → 2022-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9880421

## Citation

> US National Institutes of Health, RePORTER application 9880421, Cortical tracking of speech-specific temporal structure in familiar vs. foreign speech (5R21DC016386-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9880421. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*