# Digitizing Human Vocal Interaction to Understand and Diagnose Autism

> **NIH NIH R01** · CHILDREN'S HOSP OF PHILADELPHIA · 2024 · $669,292

## Abstract

Abstract
This proposal tackles an urgent need for sensitive clinical outcome measures of autism spectrum disorder (ASD)
by developing an objective, digital, multi-modal social communication metric using computational linguistics
(e.g., acoustic features, turn-taking rates, word frequency metrics). Our automatic speech recognition and natural
language analytics approach is designed to fix known weaknesses in traditional measurements by providing
granular information in less time, with built-in scalability for characterizing very large samples. Since ASD is
defined by observables, it is ripe for an automated approach to digitizing behavior (e.g., words, sounds, facial
expressions, motor behaviors). This proposal piggybacks on a recently funded R01 that uses computer vision
and machine learning to characterize nonverbal motor synchrony in teens with either ASD or another disorder
in a brief social conversation (MH118327, PI: Schultz). Vocal components of the conversation are not studied in
MH118327; thus, the richness of the verbal domain is left untapped. We hypothesize that automatically derived
spoken language markers will significantly predict group and individual differences in social communication skill,
and – when fused with nonverbal features – will lead to better prediction than either modality alone. Together,
these two projects represent a rare chance to study all observable social signals emitted during social interaction
in the same diverse sample of participants. If funded, this project will be the first to use short conversations and
multi-modal data fusion to predict social communication skill and diagnostic group in a large, clinically diverse
sample of individuals with ASD and other disorders. Our pilot studies showed that a relatively small set of vocal
features from a six-minute interaction predicts diagnosis (ASD vs. typical development [TD]) with 84% accuracy.
These machine learning analyses also predicted social communication skill dimensionally, providing a granular
metric of individual differences. Combining this approach with nonverbal metrics (R01MH118327) using decision
level data fusion resulted in significantly better ASD vs. TD prediction – 91% accuracy. These pilot results are
promising, but several gaps remain. In Aim 1 of this proposal, we assess the specificity of our vocal social
communication approach by including a non-ASD psychiatric control group in our machine learning classification
models, in addition to ASD and TD groups (N=250/group). In Aim 2, we clinically validate our transdiagnostic
dimensional metric in a large, diverse sample of participants. In Aim 3, we test whether novel, sophisticated
multi-modal fusion methods that combine vocal and nonverbal social communication features result in improved
individual and group prediction. This proposal lays critical groundwork for an automated, precision medicine
approach to studying, diagnosing, and caring for individuals with ASD and other mental health cond...

## Key facts

- **NIH application ID:** 10844451
- **Project number:** 5R01DC018289-05
- **Recipient organization:** CHILDREN'S HOSP OF PHILADELPHIA
- **Principal Investigator:** Julia Parish-Morris
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $669,292
- **Award type:** 5
- **Project period:** 2020-06-01 → 2026-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10844451

## Citation

> US National Institutes of Health, RePORTER application 10844451, Digitizing Human Vocal Interaction to Understand and Diagnose Autism (5R01DC018289-05). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10844451. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*