I am a digital archivist working at the Digital Repository of Ireland, the organisation which co-hosted RDA Plenary 3 in Dublin with Insight and the Australian National Data Service in March 2014. As RDA3 was in my home town it was easy for me to attend; and in September I was lucky enough to be awarded an Early Career Researcher bursary to attend RDA4 in Amsterdam.
My work as a digital archivist involves research in a number of areas and contributes to the development of the Digital Repository of Ireland and its policies. These areas include IP and licensing, metadata harmonisation, data citation and persistent identifiers. For this reason I chose to report from the two Persistent Identifier Information Types WG breakout sessions.
The Persistent Identifier Information Types (PIT) Working Group was one of the first groups to be established at RDA, and has been the first to deliver its planned outputs, which include a final report, a demonstrator GUI, prototype source code and API documentation. These outputs are all available now via the WG's page on the RDA website.
The PIT WG statement describes the purpose of the group as follows - "In complex data domains, unique and persistent identifiers (PIDs) associated with specific information are the core of proper data management and access. They can be used to give every data object (including collection objects) an identity that enables referring to the data resources and metadata and, additionally, to prove integrity, authenticity and other attributes. But this requires a PID to be uniquely associated with specific types of information, and those types and their association with PIDs must be well managed. Therefore it is useful to specify a framework for information types, to start agreeing on some essential types, and to define a process by which other types can be integrated. The framework provides generic facilities only, which can and must be employed by specific communities to support their needs. The focus of the working group therefore is on cross-community concerns".
As RDA4 marked the completion of 18 months of the PIT WG, two breakout sessions were held, firstly to provide an overview of the group's work to date, and secondly to garner feedback and comments on the group's outputs and possible next steps.
Breakout session one: PID Information Types 1, 22 September 2014, 3.30-5pm
Chairs: Tobias Weigel and Tim DiLauro
The first breakout session began with a recap of the group's work to date. Some of the early issues with the topic were acknowledged, for example the initial lack of common understanding of the term "type" in this context. The group's deliverables were outlined, which include the demonstrator GUI. It was flagged that although the deliverables were potentially ready for community uptake, there was not yet a process in place to encourage this. A long term goal for the group's work was also identified: that in the future, PID and PIT APIs sit side by side and this information is created simultaneously for data.
A technical demonstration was then given by Thomas Zastrow, which led to an interesting discussion with a number of questions from the floor. One suggestion was that a PID prefix could be associated solely with one PID Information Type, preventing the creation of PIDs which did not conform to this type. However Tobias noted that the purpose was to create an assertion about the PID Information Type, rather than to prescribe a type.
During this session it was also flagged that there is related working happening in another Working Group, the Type Registry WG which has its own prototype, and that there are interdependencies between the PIT WG and the Type Registry Group.
The session was then adjourned until the following day.
Breakout session two: PID Information Types 2, 23 September 2014, 11.30-1pm
Chairs: Tobias Weigel and Tim DiLauro
The second break out session focused on potential adoption, any open issues, and a discussion on future work.
Firstly there was a discussion on the potential licensing of the source code which was generated by the Working Group, as RDA does not prescribe a particular licence for code. It was suggested that an open licence such as Apache or Creative Commons would be appropriate, but it was also noted that code from other libraries had been reused and the licences applied to this code must also be respected. A CC0 dedication was rejected as it does not allow attribution for RDA and the Working Group. Following on from this there were questions from the floor on the potential licensing of new Types added to the Registery, and it was agreed that CC0 would be appropriate for these, with attribution suggested, but which users are not compelled to include. There was also a query on the licensing of the documentation for the API, and it was confirmed that RDA supports the application of CC-BY (Creative Commons Attribution) or CC0 (Public Domain Dedication) in such cases.
Another question from the floor asked about possible overlaps with other RDA Working Groups, for example with the Metadata Working Group which considers some similar fields, such as the designation of authors and use of ORCIDs. Tobias clarified that types such as “author” are not really appropriate for the use cases put forward for this Working group, and types such as checksum or fixity are more appropriate - it should not overlap with information which already exists in a PID record.
With regards to adoption, there has already been interest from Dataconservancy, NIST and DNB. It was noted that it is important to remember that although adoption should be encouraged, it cannot be forced.
Possible follow-ups for the Working Group were captured as: an investigation of discipline specific types; the establishment of a type ecosystem; continued work on the data model; and enhancement of the REST API.
Finally it was stated that this would be the final session of the PIT Working Group, and the meeting was closed.
All of the documentation and deliverables are available now at https://www.rd-alliance.org/group/pid-information-types-wg.html.
I attended a number of other sessions at RDA4, including the plenary sessions, the Science stream, and working group/interest group sessions including Data Citation: Making Dynamic Data Citeable and Publishing Data Workflows. Every session had a mix of contributors from a variety of backgrounds, and all included interesting debates and new perspectives on their respective topics.
I would like to thank RDA for providing the funding which allowed me to attend RDA4. It was an extremely worthwhile experience and I hope to attend future plenaries if possible.