# Computational analysis of child language transcript data

> **NIH NIH R01** · CARNEGIE-MELLON UNIVERSITY · 2024 · $1,447,674

## Abstract

Summary
The Child Language Data Exchange System (CHILDES) Project seeks to
broaden and deepen our scientific understanding of language development by
providing new ways of analyzing real world face-to-face interactions. The
computational tools that had been developed in the previous phases of the
project constitute the primary methodological basis for new empirical research on
the development of spontaneous use of a first language. This work has resulted
in over 10,000 published articles examining all aspects of language development,
including word learning, sound learning, grammatical development, and
communicative development. All of the programs and data sets are provided
over the web without charge to researchers. The database that has been
collected using these tools is now the largest database on natural spoken
language interactions available anywhere. However, we can achieve still greater
efficiency and analytic precision by building even more powerful computational
tools. The next phase of this project will develop new techniques to support
analytic methods in the study of language development. These methods include
rapid computer-assisted transcription of interactions, diarization of daylong audio
recordings made in the home, automatic analysis of morphological and syntactic
structures, a simple user interface for searches, web-based support for
collaborative commentary between research groups, construction of standard
comparison group norms, and methods for moving data between different
programs for alternative analyses. In addition, we will promote the use of the
database and programs by constructing web-based tutorials, by improving the
current user interface, and by conducting workshops and presentations at
conferences.

## Key facts

- **NIH application ID:** 10954349
- **Project number:** 2R01HD082736-21
- **Recipient organization:** CARNEGIE-MELLON UNIVERSITY
- **Principal Investigator:** BRIAN MACWHINNEY
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $1,447,674
- **Award type:** 2
- **Project period:** 2007-05-01 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10954349

## Citation

> US National Institutes of Health, RePORTER application 10954349, Computational analysis of child language transcript data (2R01HD082736-21). Retrieved via AI Analytics 2026-06-01 from https://api.ai-analytics.org/grant/nih/10954349. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
