Open Research will be unavailable from 3am to 7am on Thursday 4th December 2025 AEDT due to scheduled maintenance.
 

Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics

Authors

Forkel, Robert
List, Johann Mattis
Greenhill, Simon
Rzymski, Christoph
Bank, Sebastian
Cysouw, Michael
Hammarström, Harald
Haspelmath, Martin
Kaiping, Gereon
Gray, Russell D.

Journal Title

Journal ISSN

Volume Title

Publisher

Nature Publishing Group

Abstract

The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.

Description

Keywords

Citation

Source

Scientific Data

Book Title

Entity type

Access Statement

Open Access

License Rights

Creative Commons Attribution 4.0 International License

Restricted until