2015-09-24 RPRD P6 Joint Meeting

Joint meeting of IG Libraries for Research Data, IG Long Tail of research data & IG Repository Platforms for Research Data

24 September 2015- BREAKOUT 5 - 13:30
Report made by Mersiha Mahmić-Kaknjo

Co-chairs: Kathleen Shearer, Michael Witt, Najla Rettberg, David Willcox, Stefan Kramer, Ralph Pepperkorn
Moderator: Wolfram Horstmann
1. Welcome and Synchronization Section – Wolfram Horstmann 
Short keynote by Wolfram Horstmann:
Repository Platforms for Research Data formed during the last Plenary. Going to discuss use cases, relevant for all groups, Kathrin is going to present results of poll. There will be 5 min to introduce each group.


- IG Libraries for Research Data – Michael Witt
Unlike the other two groups there is no output agenda, just potential for collaboration. Quick overview: 150 subscribers at the moment, mostly from academic and research libraries. There are topics of mutual interest. P2: first BoF meeting. Presented some outputs from the IG: special issue on research data management in IFLA journal, 2 briefing papers, joint RDA-IFLA program, and RDA SloanData Share fellowship.
- IG Long Tail for Research Libraries Interest – Kathleen Shearer
Needed some special coverage to some challenges. Majority of these datasets belong to the long tail. Later it will be discussed if the long tail is the right term. There is a lot interest in what they are doing, they became formal group in P2. Interoperability was the first issue discussed, what the metadata practices at different repositories are. Large issue: harmonize the metadata. Discussion on convincing researchers to deposit their data in repositories, it is easier to deposit in domain repositories. Long tail has some special challenges, what kind of practices has been used. 
What kind of tools could be provided to persuade researchers to deposit their data. Engaging outside of RDA Wolfram has done some presentation in RDA EU, Kathleen presented at EUDAT. David talking to repository platform both long tail and other.
- IG Repository Platforms for Research Data – Ralph Pfefferkorn
Crucial role is played by repositories, usually not only one repository. Definition: repository is searchable and query-able interfacing entity that is able to store, manage, maintain and curate data/digital objects. Too many verbs to care about in this definition. It is an IG, the only one with the goal, and has a time-frame to fit in. Goal: Collection of requirements on repository software, agnostic to specific tools or products, the second task to look at things other people have done, look into current repository world.
Envisioned result is metrics in the end, which will consist of 3 points: line of requirement, and the importance of requirement for this metrics, publish it, give repository developers and providers, to tell them what users really need. At the end these findings will support communities to have better products that really fit the needs of users.
Q from the floor: Are there going to be more joint use case matrices?
A: there will be only just one use case matrix. That would be very relevant for libraries, the main goal is preventing the duplication of the efforts, and to stimulate collection of use cases.
Q: There are already 18 use cases collected in other IGs.
A: Very good suggestion. We are interested to take look in others groups use cases.
Q: Do you know any other exercises which do the same collecting?
A from the floor: Start thinking how these could be accumulated from the other groups from the RDA web page.
A from the floor: It is more organizational effort, is there a possibility that RDA could anyhow help.
RDA is aware of the issue. In Germany there is a collection of user stories.
David: initial idea was to use user stories, they are simple stories, where users describe what users wants. But they wanted to take longer stories, they wanted to describe the whole repository architecture. If one was to build a repository, would use use cases.
Natasha Q: (developer of repository), how to make sure that users know what requirements are? How to define repositories requirements? 
A: it is not only researchers that should be involved, there are also granting agencies, academic institutions, institutes,... All the stakeholders could come with some protocols. It is not 1-to-1 interaction researchers-repository makers, the other stakeholders should be included. 
David: not many these other stakeholders attend these meetings, some connections to them should be made. Point is very well taken, what you did in order to specify requirements to make your repository. It is impossible to reach all the stakeholders. Researchers and repository maker should try  to make them aware that this is a really important task.
Comment from the floor: Use case development, dynamic citation, data types may have some relevance in repository platforms.
A: important not to duplicate.
Q: There is an important issue to be raised: researchers do not express their needs in the same language as repositories' managers, there is a need to translate their expressions into meaningful requirements. There is also another issue: researchers are also users of these repositories,  researchers come at both ends.
A: Completely agree. We assumed that repository managers have discussed use cases with stakeholders. But it is still important to harmonize researchers' and repositories' vocabulary.
Q: Is there a way to submit ad hoc requirements?
A: I do not want to make unilateral discussion, we should consult a group, and the result should come as a result of a discussion in the group. At the moment we are trying to understand what the real needs of stakeholders are. This also could be an outcome, to find information on repositories in 2-3 features. Repositories used for publications are quite similar, on the contrary, research repositories are extremely different.
Q: Have you tried to actively contact repositories' managers?
A: We have made a mailing list, so not to duplicate the effort.
Final comment: We call for use case submissions.
R3data can organize call in library and long tail groups.
2. Use Case Matrix for RDM Repositories (IG Repository Platforms) – David Wilcox
Work for Open source Fedora. The goal is to help community to understand what users need. The template was developed, with a short description of use case, and it was circulated inside the group.  It was also discussed on a virtual conference, which was held once a month. The different ways to distribute it were discussed, it was difficult to settle it all in 1 page document, which is now publicly accessible.
In the bottom of the very template there is the list of requirements, it was derived from use cases. There were 4 questions: 1. motivations and outcome - what repositories are, what the needs are, what a repository should accomplish; 2. functional description - it is possible provide a diagram/picture, 3. achieved results - describe results if applicable; 4. requirements  for the described use case. There was an example of use case presented. The description in user words is very valuable, talking more in terms what does the user need. It was important to find out why the user needs that very requirement, what user gets from it. Also provisional use case was provided so people could get the idea what use case is. Few months ago they started soliciting for use cases, already have 6 submissions. Tomorrow will go through these use-cases, they will review them and the template itself. There was an obvious diversity of use cases, some dealing with domains some platforms etc. Assumption is that many of the functional requirements are familiar, it has to be established what is the best format to present it. There are a lot of themes  requirements on PID differing, the pretty common theme seems to be a platform that supports whole list of PID, authentication, single sign. Since there are different groups of users, lot of common requirements raise. It seems that versioning and audit history are important, and pretty common so far. Scalability is also a common requirement. There are some common themes popping up from these use cases, and it is supposed that other themes will emerge too. The help is needed and that is why they are in this session, other groups most probably have their use cases, so there is an appeal for submitting use cases. Later on it will be figured out where the gaps are. At the moment everything is open, there is a collection of use cases on RDA web-page, publicly available. There is a template on the top, it can be downloaded and filled, and it is really simple. There is a need to continue collecting use cases, and afterwards assign people to edit these use cases, do some analysis, determine where the gaps are. Maybe there are different areas which are not represented. Then a map of functional requirements could be developed, a kind of matrix. 
Initially they were WG, engaged in comparing existing repository platforms. At the moment they have close to 80 members, meeting monthly virtually, 8-10 people present every time.
3. Action poll on RDM Tools (IG Long Tail of Research Data) - Kathleen Shearer
This is a really good transformation,  purpose to identify important functionality, and then build in these functionalities, and they do not have to run after researchers to deposit in repositories. It was a bit ad hoc poll, a lot of people within the room helped to create this poll, 8 quick short questions through Surveymonkey. The aim was to define some common functionalities. They haven't done lot of analysis, the results are preliminary, and over half respondents were researchers, majority of them from life sciences and social sciences.
Q1 in the poll - to identify 3 tools. There were 60 different tools identified, it was noticed that there was a huge variety of tools, e.g. R, Excel, LabTablet, SAS, SPSS. The tools researchers identified would probably be different than the ones the librarians identified.
Q2 - what features are the most important.
Some of the answers were: ease of use, flexibility, capability, capacity to analyze large data sets, enables sharing, open source, interoperable, commonly used,... long tail of features.
Q4 - do they use registry to find tools? Majority answered no.
Next step would be to do further analysis, on research domains and jobs categories, mapping tools to stages of the life cycle, publish results on website, ignite discussion about how we can better support researcher’s workflows, and re-evaluate priorities.
Q: users use registries but are not aware of it. There were probably not many physical scientists in the survey.
A: Yes, the sample was not so huge, it is a small piece of much bigger puzzle, 100 responses is not much, bigger samples needed.
Q: is there a lesson learned from this action?
A: It was distributed through RDA, it is hard to know. There could be more responders.
Q: There is a relevance to Education group, since the point of the slide is how to better support users. The Education group is doing survey on training, maybe the efforts could be joined.
A: Thank you for this idea.
Q: People often use tools that are popular in their disciplines, but do not offer right features.
Q: This was an effort to identify commonalities (rather a pilot than a comprehensive action), which could be helpful in the next phase. 
A: These are very helpful comments, yes, these were not final results, it is clear that the analysis has to be extended in some sense but there are already some conclusions to be made.
Q: Focus is on interoperability, in the area how repositories could be better, are there any other ideas? Do we want to categorize data as a long tail, it marginalizes them, is there a better term?
Cristina: there is no alternative at the moment, let’s just talk about data.
Discussions about long-tail attracts a lot of people, in Riga was an interesting discussion, there is perception that long tail data is „other“ kind of data, although lot of real innovation work was build on long-tail data.
Q: Long tail data are also heterogeneous.
Q: There is a perception that money is needed to get big data, and meetings are needed to get long-tail, small data.
Q: Long tail data is a complex and diverse area.
Q: Not just diversity, also serving communities, service provisioning of data.
There is a large support from policy, maybe something else has to be done now.