# Computational analysis of child language transcript data

> **NIH NIH R01** · CARNEGIE-MELLON UNIVERSITY · 2020 · $320,332

## Abstract

Summary
The Child Language Data Exchange System (CHILDES) Project seeks to broaden and
deepen our scientific understanding of language development by providing new ways of
analyzing real world face-to-face interactions. The computational tools that had been
developed in the previous phases of the project constitute the primary methodological
basis for new empirical research on the development of spontaneous use of a first
language. This work has resulted in over 8000 published articles examining all aspects
of language development, including word learning, sound learning, grammatical
development, and communicative development. All of the programs and data sets are
provided over the web without charge to researchers. The database that has been
collected using these tools is now the largest spoken language database available
anywhere. However, we can achieve still greater efficiency and analytic precision by
building even more powerful computational tools. The next phase of this project will
develop new techniques to support analytic methods in the study of language
development. These methods include rapid computer-assisted transcription of
interactions, diarization of daylong audio recordings made in the home, automatic
analysis of morphological and syntactic structures, a simple user interface for searches,
web-based support for collaborative commentary between research groups, construction
of standard comparison group norms, and methods for moving data between different
programs for alternative analyses. In addition, we will promote the use of the database
and programs by constructing web-based tutorials, by improving the current user
interface, and by conducting workshops and presentations at conferences.

## Key facts

- **NIH application ID:** 9987690
- **Project number:** 5R01HD082736-17
- **Recipient organization:** CARNEGIE-MELLON UNIVERSITY
- **Principal Investigator:** BRIAN MACWHINNEY
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $320,332
- **Award type:** 5
- **Project period:** 2007-05-01 → 2024-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9987690

## Citation

> US National Institutes of Health, RePORTER application 9987690, Computational analysis of child language transcript data (5R01HD082736-17). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9987690. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
