Brokering/Data Fabric Joint Session
Session Page: https://www.rd-alliance.org/rda-9th-plenary-joint-meeting-ig-brokering-ig-data-fabric
Slides and the two-page handout can be found on the wiki page at https://www.rd-alliance.org/group/data-fabric-ig/wiki/data-fabric-ig-broker-driven-core-component-workflows
The discussion surfaced the following potential areas of future activity:
- Formalization of how:
- to describe the broker
- to encode what the broker does and what roles it plays
- a broker finds what it needs from the data fabric
- Development of patterns or recipes for use of brokers and Identifying the decision path for implementing a brokered workflow
- Defining a general standard or framework for data transformations and mappings
The Brokering Framework WG is currently defining the conceptual model of a broker, including roles a broker can play, typology, and recommendations for persistent identification of broker instances. The work of this WG could be seen as a prerequisite for tackling the above challenges, or they could be done concurrently if interest and need was great enough within the community.
Outcome: Next step would be for members of the RDA community (and the Brokering and Data Fabric IGs in particular) to consider whether the need is great enough right now for anyone to step up and start a new WG effort or if we should hold off until the Brokering Framework WG is further along with its work.
Data Fabric IG Spin-off - Remaining PID Challenges
Session Page: https://www.rd-alliance.org/data-fabric-ig-spinoff-session-rda-9th-plenary-bof-meeting
We reviewed the Grouped List of Assertions on PIDs (https://www.rd-alliance.org/sites/default/files/GEDE_PID_Focus_Area_disc...)
Discussion included the following topics:
- What is a "trustworthy repository"? Workflow tools might be considered repositories of a sort, for researchers who want to provide citable references to work in progress.
- Fragment indicators and the need to separate PIDs from Fragment Identifiers, the latter of which have a role which is interpreted by the repository. Persistent Fragment Identifiers are not inconceivable but require community building.
- Whether PIDs can/should be created only at the moment they are needed.
- The need for some clarification on the recommendations on semantics and PIDs.
- The community is lacking policy formalisms for PID policies around versioning.
Outcome: A clear need was identified to begin to formalize recommendations around PIDs and Versioning. It was recommended that we convene a joint session on this topic at P10 (including at least the Provenance IG, the proposed Versioning IG, the PID Kernel WG and the Collections WG). A first goal could be defining what we mean by versioning.
Session Page: https://www.rd-alliance.org/ig-data-fabric-core-session-rda-9th-plenary-meeting
We reviewed current activity of the DFIG, as can be seen in the session slides at https://www.rd-alliance.org/system/files/documents/P9%20Core%20Session.pptx
The bulk of the discussion centered on the topic of the Global Digital Object Cloud (GDOC) as presented by Larry Lannom.
- The general consensus was that the GDOC concept can and should move forward. The idea of an object which advertises its capabilities, along the lines of what the Collections WG has been defining for Collections was well received.
- This may lead to an Object Interface WG. Additional next steps for moving the GDOC concept forward could include getting it implemented in test beds and identifying some infrastructures or organizations willing and interesting in experimenting with it.
- We also need machine-readable registries, global mechanisms to link PIDs and Metadata, and Governance structures.
Joint Session with Education and Training IG
Session Page https://www.rd-alliance.org/rda-9th-plenary-joint-meeting-ig-data-fabric-ig-education-and-training-handling-research-data
(Slides from presenters can be found on the session page)
The 23 Things audience was mainly librarians. Took a user-driven approach, with different levels of tasks, immediate feedback, community forums, and badges. Modules are available in multiple languages for reuse. A next phase will begin looking at applying these to increasingly technical concepts. This was an effort that was funded by ANDS and had large local community engagement.
The RDA EU/ENVRI summer school target audience are Data Scientists and Data Managers. A significant amount of pre-workshop preparation is needed to ensure that all the pieces will function properly together. Highly technical, and while it includes RDA outputs (DFT, DTR, PIT) these are not the sole focus of the workshop.
We discussed efforts like those of the EDISON Project (http://edison-project.eu/) to group resources by stakeholder and how they might apply to RDA outputs.
We identified a few different possible audiences for training workshops:
- University Leadership
- Funding Agencies
- Policy Makers
We chose the University Leadership as the audience to focus on as this is a group that has the potential to drive funding and adoption of RDA activities and outputs and which hasn’t been specifically targeted by existing efforts.
Key motivating factors for engaging this audience include:
- Risk Management
These motivations can be addressed by talking about
In turn, Reproducibility leads to the need for RDA outputs
- Repository Certifications
- PID Types
And Citation to
- Data Citation Recommendation
- Dynamic Data Citation Recommendation
We can also consider Social Outputs
And address pain points, and how to share costs (Domain Repositories, Legal Interoperability)
Education effort should focus on introducing the high level concepts and providing direct advice on how and where RDA outputs can be implemented by the university and what needs they can address.
- It would be helpful if WGs included statements aimed at policy makers in their case statements
- But this could also be something that paid product specialists might address, if RDA had them
Reaching this audience in the US means reaching out to the AAU, in Europe to LERU, in Africal to CoData and in Australia to the Commonwealth Universities.
PID Recommendations Session
Session Page: https://www.rd-alliance.org/ig-data-fabric-recommendations-session-rda-9th-plenary-meeting
Peter presented the GEDE Group document https://www.rd-alliance.org/sites/default/files/GEDE_PID_Focus_Area_discussion_document_v2_%28002%29.docx
Feedback and Questions from Discussion
- What does it mean that a PID and PID system is "long lasting"?
- need to make clear that we need the resolution of the identifier
- "make it actionable" should be restated as make it actionable *on the web*
- And maybe need to clarify what constitutes a "landing page"
- How does one judge trustworthiness of a PID system
- DKRZ trustworthy means to have trust in PID maintainer (self) to say they take care of links between PIDs and object that they aren't going to break, and will show tombstones; and some embedding in a larger framework (EPIC, Dona) that there is a larger faith in the handle system itself
- are these 2 levels of trust sufficient?
- How open should PIDs be?
- Should information on the PID system be released with a free license?
- This is related to exit strategy
- there is no international board for governance of PID systems currently but somebody needs to do this?
- What is the meaning of "governed"?
- set up the policies of how a certain service provider should operate and govern the development of the system
- e.g is there a policy for changes? how are requests handled?
- Is there transparency, is it run well
- in EUDAT they have designed a set of business processes for making such a federation work (e.g. if you have a request for a new handle prefix, where do you go, what do you do?). first draft done, but not yet implemented. one of next steps might be a first rollout in a larger PID federation
- What about privacy considerations?? (monitoring who is using what PID?)
- maybe should refine: do not include in the identifier any transient property (e.g. owner, location, etc)
- suggest change "not good practices" to "not required"
- "Pid resolution system should not be dependent upon semantics of the identifier"
- if you want to participate in GEDE meetings, send email to Peter, Carlo, Maggie
- Next virtual meeting in May.
- Next topics for bundles/recommendations:
- Versioning - Maggie and Carlo will initiate work on this
- Repositories - Peter will initiate work on this