Digitizing Human Vocal Interaction to Understand and Diagnose Autism

NIH RePORTER · NIH · R01 · $589,454 · view on reporter.nih.gov ↗

Abstract

Abstract This proposal tackles an urgent need for sensitive clinical outcome measures of autism spectrum disorder (ASD) by developing an objective, digital, multi-modal social communication metric using computational linguistics (e.g., acoustic features, turn-taking rates, word frequency metrics). Our automatic speech recognition and natural language analytics approach is designed to fix known weaknesses in traditional measurements by providing granular information in less time, with built-in scalability for characterizing very large samples. Since ASD is defined by observables, it is ripe for an automated approach to digitizing behavior (e.g., words, sounds, facial expressions, motor behaviors). This proposal piggybacks on a recently funded R01 that uses computer vision and machine learning to characterize nonverbal motor synchrony in teens with either ASD or another disorder in a brief social conversation (MH118327, PI: Schultz). Vocal components of the conversation are not studied in MH118327; thus, the richness of the verbal domain is left untapped. We hypothesize that automatically derived spoken language markers will significantly predict group and individual differences in social communication skill, and – when fused with nonverbal features – will lead to better prediction than either modality alone. Together, these two projects represent a rare chance to study all observable social signals emitted during social interaction in the same diverse sample of participants. If funded, this project will be the first to use short conversations and multi-modal data fusion to predict social communication skill and diagnostic group in a large, clinically diverse sample of individuals with ASD and other disorders. Our pilot studies showed that a relatively small set of vocal features from a six-minute interaction predicts diagnosis (ASD vs. typical development [TD]) with 84% accuracy. These machine learning analyses also predicted social communication skill dimensionally, providing a granular metric of individual differences. Combining this approach with nonverbal metrics (R01MH118327) using decision level data fusion resulted in significantly better ASD vs. TD prediction – 91% accuracy. These pilot results are promising, but several gaps remain. In Aim 1 of this proposal, we assess the specificity of our vocal social communication approach by including a non-ASD psychiatric control group in our machine learning classification models, in addition to ASD and TD groups (N=250/group). In Aim 2, we clinically validate our transdiagnostic dimensional metric in a large, diverse sample of participants. In Aim 3, we test whether novel, sophisticated multi-modal fusion methods that combine vocal and nonverbal social communication features result in improved individual and group prediction. This proposal lays critical groundwork for an automated, precision medicine approach to studying, diagnosing, and caring for individuals with ASD and other mental health cond...

Key facts

NIH application ID: 9866358
Project number: 1R01DC018289-01
Recipient: CHILDREN'S HOSP OF PHILADELPHIA
Principal Investigator: Julia Parish-Morris
Activity code: R01
Funding institute: NIH
Fiscal year: 2020
Award amount: $589,454
Award type: 1
Project period: 2020-06-01 → 2025-05-31