3rd Plenary Meeting Notes (Dublin)

Agenda and Notes for inaugural meeting of the RDA-Domain Repositories IG in Dublin

Thursday, 27th March 2014

Croke Park Conference Centre, Dublin, Ireland

11:00-12:30, Suite 687.

 

15 people present

Agenda

  • Origins of the IG

  • What are our shared needs and concerns as domain repositories?

  • How can we help each other?

    • What can we share?

    • How can we help new repositories?

  • How can RDA help?

    • Which WGs should we monitor?

    • What problems could a new WG solve for us?

    • How can we  promote products of WGs?

  • How should we communicate between RDA Plenaries?

  • Slide for the “Minute Madness”

Origins of the IG

 

George started the meeting off by describing the case statement:

Last year (June 2013) ICPSR convened a meeting of US repositories in different (about 20) domains. The topic of the meeting was “How do we preserve data long-term if the funding is short-term?”  There are growing demands from funding agencies in the US to make data openly available, but without clear funding. At the meeting the repositories found that they had a lot in common. Making data re-usable (e.g., through metadata creation) plays an important role in all repositories. You cannot create make data re-usable without good knowledge of the communities you serve. Outcomes were:

 

  • a white paper addressing sustaining domain repositories.

  • a desire to continue the conversation - thus the IG case statement

 

The RDA council approved the case statement so now we need to figure out how to move forward.  Other reasons to do this are that many WGs in RDA are developing tools and policies that are relevant for domain repositories. It is therefore necessary to raise the voice of domain repositories in the working groups and to keep track of what is going on there, in order to implement the useful outcomes.

 

Introductions:

See list of participants: George (ICPSR, social scieces), Peter (DANS, humanities & social sciences), Bob (astronomy), Francoise (astronomy), Louise (complex qualitative data), Rob NCSA, Matt (formerly NSF), Steve, Ruth, Prasad, Devika (Indian lady), Albert, Vasilis Protonotarios (agriculture, Greece), John (environmental data),  Alexa (GESIS, social science quantitative data), Johannes (FAO).

What are our shared needs and concerns as domain repositories?

Bob Hanisch: Sustaining Data Infrastructures

Bob introduced the white paper produced as the outcome of the workshop last year “Sustaining domain repositories for digital data”, December 2013. The paper gives an estimate of the cost of archiving data, which for the Hubble Space Telescope (HST) is on the order of 1% of the cost of theHST. Now 60% of HST publications are based on archive data.

 

(A discussion on data citation arises).

 

The report also lists the various funding models of domain repositories. Of the set of possible mechanisms, the infrastructure model is the best way to go. NASA is mentioned as an example. For astronomers, the sky cannot be the archive, because it changes continuously. Many of the other models (pay per deposit, membership models) are risky.

 

General discussion: Similarities and differences across domains

 

What are the concerns of the domain repositories in the room?

  • Data citation is an issue for ICPSR. The essential way to fix the problem is to put the way in which to cite data in the author’s guidelines of the journals. The scholarly communities are only beginning to realize this. Is the current citation culture suitable for solving the problem? Is DataCite an answer? That depends on the level of granularity of the references. This is a subject that the data in context group will be discussing.

  • Institutional repositories or more generic repositories? Bob says that Dataverse is not good enough for finding data if what you are looking for is very domain specific, the metadata is not deep enough. This is a very hard thing to solve, because what the researcher is after can be something that the generic archive did not think about. There is a “lost in translation” problem here. Semantic interoperability is an answer to this, but very hard to achieve in detail. There is a danger in trying to solve this at too deep a theoretical level, we need to be pragmatic. Certain basic principles help, for instance, to publish metadata standards, thesauri, standard vocabularies, etc. publicly, preferably as LOD. A generic top-down approach will not solve the problems.

  • Several people say that these questions are (to be) taken up in other groups, and then to be implemented in the domains. It is important that the technical solutions are applicable in the real world. Francoise thinks it is very important that this IG play this key role in forcing the technical groups to keep their feet on the ground.

  • This raise organizational questions of the role of the RDA, mapping the problems and who works on what.

  • Domain repositories have to make available their expertise to the institutional repositories. The long tail data group is good to interface with.

  • A federated model of data infrastructure in which various parties with different roles work together may be a direction to go. We should keep in mind that cross-domain use of data or interdisciplinary projects also have to be served.

  • How to fund these things? DMPs should become subject to peer review. NSF allows you to ask for money to manage your data. How to fund the data preservation after a project is finished? NSF is thinking about a national data facility for long-term preservation & access. Costing of long-term preservation is still an issue that we have only touched upon. Make a reservation of 2% of the project/data creation costs?

 

How can we help each other?

  • What can we share?

  • How can we help new repositories?

 

George makes the case that two things are crucial:

  • financial/institutional continuity (lobbying) - This is questioned by others in the room: the general consensus is that this is about the continuity of (selected, valuable) data, not about the continuity of the institutions. Some data will become less valuable, may also be deselected/destroyed. The exponential growth of data means that old data is becoming small data very quickly. Still, not everything can be saved for ever. A conscious selection and deselection policy for digital preservation should go hand in hand.

  • know-how (and the sharing of that, best practices, etc.).

Room 679, 1:30 PM small meeting to continue thinking about several of the funding/organisational issues.

How can RDA help?  

  • Which WGs should we monitor?

    • all the various data citation WG’s

    • long-tail WG

 

At that point we ran out of time...