RDA and Linguistics

Data are fundamental to the field of linguistics. Examples drawn from natural languages provide a foundation for claims about the nature of human language, and validation of these linguistic claims relies crucially on these supporting data. Yet, while linguists have always relied on language data, they have not always facilitated access to those data. This disconnect between linguistics publications and their supporting data results in much linguistic research being unreproducible, either in principle or in practice. Without reproducibility, linguistic claims cannot be readily validated or tested, rendering their scientific value moot.

The Linguistics Data Interest Group (LDIG) became an endorsed RDA Interest Group in July 2017, with the aim of facilitating the development of reproducible research in linguistics. The LDIG is for data at all linguistic levels (from individual sounds or words to video recordings of conversations to experimental data) and data for all of the world’s languages, and acknowledges that many of the world’s languages have high cultural value and are underrepresented with regards to the amount of information that is available about them.

The Linguistics Data Interest Group plans to develop the discipline-wide adoption of common standards for data citation and attribution. The LDIG also aims to improve education and outreach efforts to make linguists more aware of the principles of reproducible research and the value of data creation methodology, curation, management, sharing, citation and attribution. Additionally, the LDIG aims to help drive efforts to ensure greater attribution of linguistic data set preparation within the linguistics profession.

Related organisations

  • Digital Endangered Languages and Musics Archives Network (DELAMAN)

  • Linguistic Society of America Committee for Scholarly Communication in Linguistics (CoSCIL)

  • Tromsø Repository of Language and Linguistics (TROLLing)

  • Data Citation and Attribution for Reproducible Research in Linguistics project, sponsored by the National Science Foundation (SMA 1447886)

  • Open Language Archives Community (OLAC)

  • Linguistic Data Consortium (LDC)

  • CLARIN - European Research Infrastructure for Language Resources and Technology

  • FORCE11 Attribution Working Group