Citation of research data for linguistic publications
Collaborative session notes:
Short introduction describing the scope of the group and if any previous activities
The Linguistics Data Interest Group (LDIG, https://www.rd-alliance.org/groups/linguistics-data-ig) works within RDA to identify, prioritize, and get to work on data challenges across the domain of the language sciences. The LDIG targets data at all linguistic levels, from individual sounds or words to video recordings of conversations to experimental data. It is for data for all of the world’s languages, and acknowledges that many of the world’s languages have high cultural value and are underrepresented with regards to the amount of information that is available about them.
The LDIG objectives include, but are not limited to, the development of a discipline-wide adoption of common standards for data citation and attribution, and improvement of research data management training in the discipline. The interest group is aligned with the RDA mission to improve open sharing of data, and a first published version of the Austin Principles for Data Citation in Linguistics, which was the topic for the LDIG working session at the 10th RDA Plenary, is now available from http://site.uit.no/linguisticsdatacitation/austinprinciples/. At the 11th RDA Plenary, LDIG took the citation issue one step further and discussed citation and metadata standards for the field of linguistics, and the relation between the citation and the dataset metadata. At the 13th RDA Plenary, LDIG will propose formats for data citation in linguistics, which in turn will be integrated into existing style sheets and disseminated to publishers, archives, and the researcher community.
In addition to annual physical meetings, the co-chairs have regular Skype meetings and otherwise communicate in writing. The group has 90 members, who at an irregular basis are invited to asynchronous meetings where they can express their thoughts, opinions, and examples on different topics. This way of communicating with the group has proven a success and has been very valuable for the work in progress. For activities related to citation and metadata standards, and the editing of the style sheet section, a small group of experts have been invited to collaborate extensively with the co-chairs.
LDIG is active when it comes to outreach, and its co-chairs have already been present at several linguistics conferences and other events to encourage the discussion on data citation and the transparency of science within the scientific community (see a selection of presentations at https://site.uit.no/linguisticsdatacitation/publications/).
Additional links to informative material related to the group
- LDIG group page: https://www.rd-alliance.org/groups/linguistics-data-ig
- LDIG Charter Statement: https://www.rd-alliance.org/group/linguistics-data-interest-group/case-statement/linguistics-data-interest-group-charter
- The Austin Principles of Data Citation in Linguistics (V 1.0): http://site.uit.no/linguisticsdatacitation/austinprinciples/
- Present a synthesis of existing initiatives on data citation in linguistics, including challenges specific to the discipline of linguistics and its publication channels.
- Draft/edit a section on citation of research data for inclusion in the Unified Style Sheet for Linguistics (https://www.linguisticsociety.org/resource/unified-style-sheet) and the Generic Style Rules for Linguistics (https://zenodo.org/record/253501#.XDCWavx7mV5), that covers citation format for both reference lists and numbered examples.
- Make a timeline for integration of the data citation section into the style sheets, and for disseminating it in relevant communities (publishers, archives, researchers).
The main goal of this meeting is to edit a section on data citation to be integrated into existing style sheets for linguistics (i.e. the Unified Style Sheet for Linguistics (https://www.linguisticsociety.org/resource/unified-style-sheet) and the Generic Style Rules for Linguistics (https://zenodo.org/record/253501#.XC4fz_x7mV4)) that are currently widely used but do not contain guidance for citing data. More specifically, based on existing initiatives such as CLARIN NL (https://dev.clarin.nl/node/4238), the data citation section will contain guidelines regarding formats for in-text citations, reference lists and numbered examples, with a careful selection of examples to make it relevant to all subfields of linguistics. The data citation section will also address the question of granularity, i.e. the citation of single file vs. subset vs. full dataset. Another goal of the meeting is to decide on how to disseminate the style sheet in relevant communities (i.e. publishers, archives, researchers). Publishers will benefit from this by having a ready-made section to incorporate into their author guidelines, archives will learn which metadata are crucial on the dataset post in order for it to be properly citable, and researchers will 1) know how to cite their data in the case author guidelines do not contain any information, and 2) know more about what to require from the archive they select for archiving their data.
We invite meeting participants to contribute actively to the editing of the data citation section and to the making of the integration/dissemination timeline.
Meeting work plan:
- 0-15 minutes: Presentation of existing initiatives and challenges
- 15-70 minutes: Editing of section on citation of research data
- 70-90 minutes: Writing a timeline for next steps (integration into style sheets and dissemination)
The session will be chaired by Helene N. Andreassen (UiT The Arctic University of Norway) and Andrea Berez-Kroeker (University of Hawai’i at Manoa), two of the LDIG co-chairs. A small group of LDIG key members will also be active in the preparation process and the carrying-out of the meeting.
We will try to offer remote participation, and those who cannot attend in person will be encouraged to submit thoughts, questions, and examples in a working document, made available two weeks prior to the session. The meeting will be audio recorded and all written presentations and notes made available through the LDIG RDA page.
- Linguists and language scholars, both those who publish/cite their own data and those who use/cite other researchers’ data
- Publishers, of both research publications and datasets, commercial and institutional
- Data archivists, both specific to language and more general
- Members from relevant RDA interest/working groups, e.g. (but not limited to) the Data Citation Working Group and the Empirical Humanities Metadata Working Group
Participants should prepare for the meeting by reading the Austin Principles of Data Citation in Linguistics (V 0.1), available from http://site.uit.no/linguisticsdatacitation/austinprinciples/, as well as the working document for this meeting (made available two weeks prior to the meeting). Group chair serving as contact person: Helene N. Andreassen
Type of meeting:
Remote Access Instructions