Data in context Interest Group had a breakout session on Tuesday 23rd of September in the 4th RDA Plenary in Amsterdam. Session was lead by Brigitte Jörg, Keith Jeffery, and Rebecca Koskela, who stepped into Brigitte’s place as a co-chair of this Interest Group in the end of this meeting. About 20 participants were present in the session.
What is Data in Context IG?
Data in Context Interest Group deals with contextual metadata that captures information related to the context of the data, such as when, by whom, how, and where has the data collection taken place and what is the data related to. This information is essential to make the data useful for further use.
Metadata and metadata standard group are closely related to this group’s work. The approach is therefore to collaborate with other groups under the umbrella of metadata.
Brigitte opened the session and gave an introduction to the group’s aims, history, approach, and current status. This was followed by the main activity of the session, which addressed the feedback on the use case template. Use cases were created addressing five different fields of research.
Brief history of the group
The initial proposal for the group’s name was contextual metadata group. However, the name was felt to be ambiguous and hard to understand, and the name was changed to Data in Context IG. Third plenary accepted Data in context as an Interest Group. Initial use cases were captured already in the 1st RDA plenary. However, it was noticed already then that use cases need to be as specific as possible and aligned with other WGs/activities. Four revised use cases were created in the third plenary:
- Researcher: find data
- Manager: indicate to funder
- Provenance: allow extract of data segments from streamed data workflow
- Interoperability: exchange of contextual metadata
In the 3rd plenary the IG covered lifecycle models related to data in context work. Several life cycle models exist, with different phases, steps, and with different interconnections.
For example, the UK Digital Curation Center (DCC) has a lifecycle approach to manage data that includes both a linear sequence of work stages as well as the cyclic repetition of specific tasks. The phases of this model include conceptualization, creation, access and usage, appraisal and selection, disposal, ingestion, preservation, reappraisal, storage, access and reuse, transformation as the elements to constitute this cycle (http://www.dcc.ac.uk/digital-curation/what-digital-curation). Stakeholders that have needs and requirements for the lifecycle of data management include producers and future users of data, for example. These different groups of stakeholders need to have means to express and use context information related to data.
A need for a use case template was raised in the third plenary and how it could be connected to lifecycle models and how context of data could be captured in use cases. This lead to the creation of a formalized use case template that is commonly used in the activities of different groups. Metadata of the context of data was added to the template, to capture information relevant to the future users of the data. This metadata now needs feedback from different stakeholder groups, to be able to revise and clarify it.
The main deliverable and contribution so far of the Data in Context IG is the use case template that supports capturing context related information. Contribution to standardization work was removed as a deliverable for the time being. Aim with the use case template is to support beyond discovery of data, to give an indication of the data quality by the included context metadata. The use case template can be found here:
Progress since last meeting
The progress that has taken place since the last meeting focuses on the use case template. First, Brigitte presented and gave on overview of the template contents and fields to prepare for working and discussing the metadata related to the context.
Other metadata related groups had earlier had feedback sessions on the use case template in the 4th Plenary. The exercise carried out in this group aimed to explore if descriptive elements in the use case template are satisfactory for context description (metadata elements).
Exercise in the group to get feedback for the use case template
An exercise was carried out in the Data in Context IG. Two sets of use cases were generated within two high level categories as follows
- running system use cases (existing systems) - use cases created for models for processing data sets, agricultural data sets, and data repositories (event logs for computer sciences)
- future use cases (no running system) - use cases for wheat (combined with agriculture in previous group), data mgmt. plans, and space science
Feedback by the participants was collected after the completion of the exercise to create use cases for the previous areas on the metadata fields for capturing context.
Several issues were raised in the feedback
- what needs to be put into the fields was not clear (description of the field)
- more generic question was discussed – online guide for fields was suggested
- different users groups exist for same use case with different scenarios – who needs to use the created piece of context information – organizing to different levels based on the user role – different people have different needs for metadata use
- differing scenarios - e.g. quality may mean different things for creator and users
- spatial coverage, temporal coverage? What does it mean? What to put in these fields?
- Procedure, theoretical background (possibly in related literature?), paradigm may be relevant information in the future.
General feedback from participants included comments that it was useful to go through the exercise by creating a use case, because you come to understand the complexity of even a simple use case. Six or seven versions (scenarios) of the same use case for a simple use case can exist.
Further feedback based on the exercise as well as the created use cases were asked to be sent to Keith Jeffery and Rebecca Koskela with the created use cases in five groups. Further use cases from different disciplines are needed to test the template on context related aspects.
Remember to send in use cases from the session exercise to Keith and Rebecca!
Keith.Jeffery [at] keithgjefferyconsultants.co.uk and rkoskela [at ] unm.edu
In the end of the session Keith presented the future plans of the Data in Context IG. These include: Use cases into repository (DICIG), Standards into MSDWG directory, Analyse for commonalities and difference (MIG), Propose canonical metadata “packages” for “purposes” (MIG), Validation of “packages” (domain Groups), Provision of convertors (this is a problem!), Move to standardization of “packages” (RDA).
Own thoughts on the Data in Context IG themes and the use case template
As a researcher working with aspects of user-centered design, usability, and human-technology interaction, I started reflecting on this topic after the group’s session. I kept thinking about the different stakeholders with their own goals and needs, the creation of the use cases to test the template, and the context related information (metadata) needed to make the data useful in the future research and for other researchers. To clarify already here: stakeholders can also be called as actors (as in use case template), or agents (as in some other contexts).
One approach, which likely has been already used in creating the use cases within RDA activities, is to create a key set of basic use cases that are generic and ideally independent of the research field, stakeholder, and data type. These use cases could include the activities related to data management including the following, for example:
- Saving data, including the saving of metadata related to the data as a related use case
- Searching for data – with different scenarios of how search takes place (criterion and starting points of search)
- Updating data
- Adding complementary (new) data
- Adding links between data sets or other objects (e.g. old and new when replicating research) – this can also be a use case related to “adding data”.
- Getting information about the use of data
- And so forth
These basic use cases may have different stakeholders involved in different roles. For example, saving of data can be done by a researcher or a group of researchers in case of multiple related datasets, but it could also be project manager, or someone responsible in the data management in the research institution. The basic set of key use cases could be applied and tested in different fields of science and application fields with varying types of data and datasets, to also test the applicability of the metadata fields for data.
In addition, I was thinking of how as a researcher I generally get an understanding of whether a research article or certain research is relevant and how I search for articles and information. This has been studied as well, but here’s a personal view.
I use several strategies in information search such as
- keywords in title and abstract – or in the keywords field,
- prior key research articles to look for more recent articles that cite the key article, or the articles the key article is citing.
When going forward from searching from keywords in title and abstract, I often read the whole abstract of the article found matching the keywords to understand what it is about and whether it is relevant. This type of free field “abstract” of the data as metadata could be useful as well for searching and getting an understanding of what the data is about, why it was created, how, for what purposes etc. So, this data “abstract” would describe the background, what kind of study/studies was carried out and its goals, research approach and methods used, type of data created, or similar in 150-300 words.
Creating a good and descriptive set of keywords for a keyword field can be surprisingly hard for researchers. To think about everything that might be interesting and important for future research in terms of searching for this particular type of data can be exhaustive. However, researchers are used to creating short verbal descriptions of what they have done (or are planning to do) in forms of abstracts. This type of approach could be useful to describe the data and help the future “user” to get a general understanding what the data is about. This abstract would be in addition to all the other metadata, naturally. Whether this is a feasible approach would need some more thorough thinking.
My trip to the Fourth RDA Plenary in Amsterdam was enabled by the RDA Early Career Programme travel grant. I thank for this excellent opportunity to participate and learn about the RDA activities.