Skip navigation
Skip navigation

Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics

Forkel, Robert; List, Johann Mattis; Greenhill, Simon; Rzymski, Christoph; Bank, Sebastian; Cysouw, Michael; Hammarström, Harald; Haspelmath, Martin; Kaiping, Gereon; Gray, Russell D.

Description

The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries)....[Show more]

dc.contributor.authorForkel, Robert
dc.contributor.authorList, Johann Mattis
dc.contributor.authorGreenhill, Simon
dc.contributor.authorRzymski, Christoph
dc.contributor.authorBank, Sebastian
dc.contributor.authorCysouw, Michael
dc.contributor.authorHammarström, Harald
dc.contributor.authorHaspelmath, Martin
dc.contributor.authorKaiping, Gereon
dc.contributor.authorGray, Russell D.
dc.date.accessioned2019-11-25T04:14:33Z
dc.date.available2019-11-25T04:14:33Z
dc.identifier.issn2052-4463
dc.identifier.urihttp://hdl.handle.net/1885/186596
dc.description.abstractThe amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.
dc.description.sponsorshipAs part of the CLLD project (cf. http://clld.org) and the Glottobank project (cf. http:// glottobank.org), the work was supported by the Max Planck Society, the Max Planck Institute for the Science of Human History, and the Royal Society of New Zealand (Marsden Fund grant 13-UOA-121, RF). JML was funded by the DFG research fellowship grant 261553824 Vertical and lateral aspects of Chinese dialect history (2015-2016) and the ERC Starting Grant 715618 Computer-Assisted Language Comparison (cf. http://calc.digling.org). SJG was supported by the Australian Research Council’s Discovery Projects funding scheme (project number DE 120101954) and the ARC Center of Excellence for the Dynamics of Language grant (CE140100041). MH was supported by the ERC Advanced Grant 670985 Grammatical Universals. GAK was funded by NWO Vici project 277-70-012 Reconstructing the past through languages of the present: the Lesser Sunda Islands.
dc.format.mimetypeapplication/pdf
dc.language.isoen_AU
dc.publisherNature Publishing Group
dc.rights© 2018 The Author(s)
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.sourceScientific Data
dc.titleCross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume5
dcterms.dateAccepted2018-08-24
dc.date.issued2018-10-16
local.identifier.absfor200402 - Computational Linguistics
local.identifier.absfor200406 - Language in Time and Space (incl. Historical Linguistics, Dialectology)
local.identifier.absfor200499 - Linguistics not elsewhere classified
local.identifier.ariespublicationu4485658xPUB1350
local.publisher.urlhttps://www.nature.com
local.type.statusPublished Version
local.contributor.affiliationForkel, Robert, Max Planck Institute for the Science of Human History
local.contributor.affiliationList, Johann Mattis, Max Planck Institute for the Science of Human History
local.contributor.affiliationGreenhill, Simon, College of Asia and the Pacific, ANU
local.contributor.affiliationRzymski, Christoph, Max Planck Institute for the Science of Human History
local.contributor.affiliationBank, Sebastian, Max Planck Institute for the Science of Human History
local.contributor.affiliationCysouw, Michael, Philipps University Marburg
local.contributor.affiliationHammarström, Harald, Max Planck Institute for the Science of Human History
local.contributor.affiliationHaspelmath, Martin, Max Planck Institute for the Science of Human History
local.contributor.affiliationKaiping, Gereon, Leiden University
local.contributor.affiliationGray, Russell D., University of Auckland
dc.relationhttp://purl.org/au-research/grants/arc/DE120101954
dc.relationhttp://purl.org/au-research/grants/arc/CE140100041
local.bibliographicCitation.issue180205
local.bibliographicCitation.startpage1
local.bibliographicCitation.lastpage10
local.identifier.doi10.1038/sdata.2018.205
local.identifier.absseo970120 - Expanding Knowledge in Languages, Communication and Culture
dc.date.updated2019-05-19T08:21:45Z
local.identifier.scopusID2-s2.0-85054893182
dcterms.accessRightsOpen Access
dc.provenanceThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
dc.rights.licenseCreative Commons Attribution 4.0 International License
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Forkel_Cross-Linguistic_Data_Formats%2C_2018.pdf876.69 kBAdobe PDFThumbnail


This item is licensed under a Creative Commons License Creative Commons

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator