# Using the RDoC Approach to Understand Thought Disorder: A Linguistic Corpus-Based Approach

> **NIH NIH R01** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2020 · $533,776

## Abstract

Thought disorder in psychotic disorders and their risk states has typically been evaluated using clinical
rating scales, and occasionally labor-intensive manual methods of linguistic analysis. We propose instead to
use a novel automated linguistic corpus-based approach to language analysis informed by artificial
intelligence. The method derives the semantic meaning of words and phrases by drawing on a large corpus of
text, similar to how humans assign meaning to language, and leads to measures of semantic coherence from
one phrase to the next. It also evaluates syntactic complexity through “part-of-speech” tagging and analysis of
speech graphs. These analyses yield fine-grained indices of speech semantics and syntax that may more
accurately capture thought disorder.
 Using these automated methods of speech analysis, in collaboration with computer scientists from IBM,
we identified a classifier with high accuracy for psychosis onset in a small CHR cohort, which included
decreased semantic coherence from phrase to phrase, and decreased syntactic complexity, including
shortened phrase length and decreased use of determiner pronouns (“which”, “what”, “that”). These features
correlated with prodromal symptoms but outperformed them in classification accuracy. They also discriminated
schizophrenia from normal speech. We further cross-validated this automated approach in a second small
CHR cohort, identifying a semantics/syntax classifier that classified psychosis outcome in both cohorts, and
discriminated speech in recent-onset psychosis patients from normal speech.
 These automated linguistic analytic methods hold great promise, but their use thus far has been
circumscribed to only a few small studies that aim to discriminate schizophrenia from the norm, and in our own
work, predict psychosis. There is a critical gap in our understanding of the linguistic mechanisms that underlie
thought disorder. To address this gap, in response to PAR-16-136, we propose to use the RDoC construct of
language production, and its linguistic corpus-based analytic paradigm, to study thought disorder dimensionally
and transdiagnostically, in a large cohort of 150 putatively healthy volunteers, 150 CHR patients, and 150
recent-onset psychosis patients. We expect that latent semantic analysis will yield measures of semantic
coherence that index positive thought disorder (tangentiality, derailment), whereas part-of-speech (POS)
tagging/speech graphs will yields measures of syntactic complexity that index negative thought disorder
(concreteness, poverty of content).
 This large language dataset will be obtained from two PSYSCAN/HARMONY sites, such that these
language data will be available for secondary analyses with PSYSCAN/HARMONY imaging and EEG data to
study language production at the circuit and physiological levels. This large language and clinical dataset will
also be archived at NIH for further linguistic analyses by other investigators.

## Key facts

- **NIH application ID:** 9859468
- **Project number:** 5R01MH115332-03
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** CHERYL MARY CORCORAN
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $533,776
- **Award type:** 5
- **Project period:** 2018-03-16 → 2023-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9859468

## Citation

> US National Institutes of Health, RePORTER application 9859468, Using the RDoC Approach to Understand Thought Disorder: A Linguistic Corpus-Based Approach (5R01MH115332-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9859468. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*