Final Notes for January 31, 2018 DFT meeting

08 Feb 2018

Final Notes for January 31, 2018 DFT meeting  (Gary Berg-Cross and Clair Austin)

This was an organizational meeting to discuss topics for P11 DFT session as well as setting a time for the February Virtual Meeting:

(Friday Feb 16th same time 10 EST and 3 CET.)

Attendees

Gary Berg-Cross

Mike Conlon

Raphael Ritz

Thomas Zastrow

Stuart Chalk

Claire Austin

 

This was an organizational meeting to prepare for a February Meeting and P-11. Among the topics discussed

 

1. P11 Status, plans and related efforts

There is 1 session planned for RDA P-11 in Berlin – a regular DFT IG meeting although presentations at other sessions are possible. (Simo Hodson will be there but will he be able to attend a DFT session?) For some we may need remote participation. Gary thought we might try a panel discussion.

 

There is no joint meeting planned with VSIG although some potential work is possible. Gary may talk with co-chairs on plans ( Adam Shepard and perhaps Simon Cox who has some supported work.)

 

There is a NIST big data effort and apparently a IEEE pre-conference we might be

involved with – Claire indicated that she is aware of this effort.

 

2. Update on terminologies

There have been a modest number of additions to the vocabulary on the Term Tool (TeD-T- see http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page) by Gary

Updates to vocabularies are still ongoing including input from recent 6 word exercise.

Other potiential sources include items from Larry's C2CAMP briefing (see below), Mapping the Landscape IG , and perhaps Vocab-services.

There are also terms from the chairs collab meeting.

 

C2CAMP examples include:

  • Science Ecosystem

  • digital object model

  • tightly associated metadata

  • Object typing

  • Mapping services

  • Global Digital Object Cloud

  • virtually aggregated digital objects,

  • Solution Space

  • etc.

 

We need definitions for these as well as from other RDA groups. Gary will contact

Larry to see if he has any to offer.

Gary also thought that we might update“core” and provide some visualizations of these o help understand and communicate the expanding RDA message.

 

3. International Research Data Management (IRiDiuM) Glossary

Collaboration with IRIDIUM may also provide some terms. We are open to sharing/expansion. IRiDiuM is developed and maintained by CASRAI in collaboration with CODATA. The glossary is found in a wiki at http://dictionary.casrai.org/Category:Research_Data_Domain . The development of new terms and definitions takes place in an online Forum. In both cases, the history is preserved and discussion threads are made public. The workflow has been modified this year for greater efficiency and tighter deadlines.

The first IRiDiuM expert panel review cycle developed 69 new terms and definitions last summer. Thirty-nine of these were moved into the glossary, while 30 that had received public comment were carried forward into the 2nd review cycle for further work in February, 2018. Once the latter are finalized, additional terms submitted from various groups will be added to the 2nd review cycle for work in March. See, for example, Appendix 2.
 

4. Other sources of terms and definitions

NIST's Big Data Interoperability Framework WG has developed definitions for , https://bigdatawg.nist.gov/home.php . ISO Standards also often include terms and definitions.
 

There was a group discussion for a process for keeping up with terms and the need for a registry. One phenomena is that some terms require alternate definitions in different contexts.

How do we handle that?

Claire observed that it is difficult to keep up with various groups that develop short lists of terms and definitions for their immediate needs.

It would be useful if they had an easy place to turn to see if definitions already exist.

Mike Conlon suggested a registry like Ontobee.com, but for data management terms and concepts.

He suggested we need a master site for data vocabularies.

A question was, is this in scope for RDA and DFT?

Raphael thought that integration efforts are our strength so

we may be a good place to start a discussion of this.

 

Based on these 2 discussions Gary proposed 2 actions:

  1. Discuss developing a process to resolve this problem of alternate definitions and how can groups cooperate on vocabulary development.

  2. Discuss developing a registry for data management vocabularies.

 

Each person should think of who to invite to the February. meeting for a larger discussion of these actions.

 

  1. Enhancements to the Term Tool and plans for additions

 

Thomas briefed this and asked if we should have a new version of the vocabulary for Plenary. Should it V 1.5 or 2.0?

 

A recurring questions concerns term PIDs and their granularity. Do we one for every term as now in Version 1. Maybe that is not needed because can we use anchors on the page which has the IP..

(The old version would stay there, but new versions would operate with a page ID and anchors. It is easier to use just 1 PID and anchor the others.

We will bring this up at P-11 for the community to decide. We need to let then know ahead of time to see what they think. It is a motivator for online and Plenary discussion.

 

Thomas also discussed the work from Collections group. They are using some DFT vocabularies and may have new, useful ways of grouping terms.

 

This is a new functionality for vocabularies available from ongoing work with a tool called Reptor. Reptor is a PHP application which turns a webserver into a data repository. It demonstrates the functionality of a modern data repository using standards such as Dublin Core along the recommendations of the Research Data Alliance (RDA) for persistent identifiers and minimal metadata. For information on the current installation see : http://dft-rda.esc.rzg.mpg.de/reptor/

 

  1. Raphael RDA work Identified ICT Technical Specifications

    Our work was recognized as specification which can be cited. This means our work can be referenced in public procurement, primarily to enable interoperability between devices, applications, data repositories, services and networks.

We have Official status under the EU public procurement legislation: “Common Technical Specification”

 

Comply with Regulation No 1025/2012, Annex II

  1. See https://datashare.mpcdf.mpg.de/s/0rq5kVmMlv0h41X for a briefing on this.

 

Raphael can present this material at P-11.

 

The meeting concluded with identifying a target date of the next DFT meeting as Friday Feb 16th same time 10 EST and 3 CET.

 

5. BoF Domain Vocabulary Activities (No sessions scheduled at P11)

The P10 BoF produced some useful sharing of ideas. This has been followed up by:
 

2 Ontolog sessions organized by Gary Berg-Cross on semantics for Domain Vocabularies. See http://ontologforum.org/index.php/DomainVocabularies

 

Vocamps sessions have been organized by Gary Berg-Cross for Chem Safety (continuing) & materials science;

 

Additional vocabulary work for the Chemistry Research Interest Group has a focus on the IUPAC "color books" https://rd-alliance.org/ig-chemistry-research-data-working-rda-8th-plenary-meeting

 

Appendix 1

Raw list of terms of particular interest to the RDA Assessment of Data Fitness for Use WG:

accessibility

anonymization

certification

certified product

curation

data collection

data management

data quality

data usability

dataset

discoverability

findability

fitness for purpose

fitness for use

interoperability

license

lineage

metadata

peer review

pid

preservation

provenance

reusable

stewardship

timestamp

usable data

version control