On convenience, diversity, and generalisability: A commentary on Scaff et al. (2025)

Loading...
Thumbnail Image

Authors

Kidd, Evan
Garcia, Rowena

Journal Title

Journal ISSN

Volume Title

Publisher

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

The Child Language Data Exchange System (CHILDES, MacWhinney 2000) is the jewel in the crown of child language research. Emerging in the 1980s to archive and facilitate the sharing and re-use of precious and laborious-to-collectand-process corpus data (MacWhinney and Snow 1985), its forward-thinking ethos predated the modern Open Science movement by decades. Thanks to the hard work of Brian MacWhinney and others, it has continued to expand, has birthed similar repositories (AphasiaBank, Forbes et al. 2012; HomeBank, VanDam et al. 2016), and no doubt inspired others (e.g., WordBank, Frank et al. 2017). Progress in the field of child language acquisition has unquestionably accelerated because of its existence. Yet, as Scaff et al. (2025) show in their paper, the data in CHILDES are not fully representative of the languages of the world and the children who learn them. Despite containing corpora on many dozens of languages, those languages are predominantly Indo-European, with the data mostly coming from affluent urban nuclear families in wealthy countries. They conclude that, because of this skew in the data, researchers should be mindful of generalising from the data.

Description

Keywords

Citation

Source

Developmental Science

Book Title

Entity type

Publication

Access Statement

License Rights

Restricted until