The application of the so-called FAIR principles highly depends on rich metadata, yet domain vocabularies are still mostly underused in several disciplines. This means that there are many data reuse opportunities missed due to the lack of engagement of researchers in data description. Therefore, the definition of use cases to further engage communities in FAIR ecosystem is recommended, by the the designated Expert Group on FAIR Data, in the Turning FAIR into reality report. With this in mind, data curators must play an essential role in strengthening the RDM practices of researchers, within the limited possibilities that researchers may have to commit to such practices. The main goal of this work is to foster the collaboration between researchers and data curators, and does so by promoting the engagement of researchers in a data curator's workflow for the development of domain-specific metadata models.
These metadata models leads to the selection of familiar concepts for the researchers that they can use in more casual descriptions, preferably by the time they start to collect data, to mitigate possible existing barriers to metadata creation. This data curator's workflow entails meetings, interviewees, the development of metadata models formalized as lightweight ontologies, followed by data description sessions with the researchers, as well as content analysis of domain publications as a complementary task to overcome communication shortcomings.
In order to assess the merits of this data curator’s workflow, 13 data description sessions were carried out in Dendro*, a staging RDM platform developed at University of Porto, between January 2018 and September 2019, with researchers from a diversity of domains. The participating researchers also completed a questionnaire to measure their attitude towards data description.
Overall, researchers have produced satisfactory or good quality metadata records. A total of 178 fields were completed and 89 different descriptors were used. On average, researchers needed 27 minutes to fill in 14 descriptors. Metadata elements regarding the context of data production, i.e. the study design, were the most used, corresponding to 55% of the total metadata created. Data description was characterized by researchers as a slightly demotivating and slightly time-consuming, yet somewhat interesting, moderately easy and moderately practical activity. The degree of usefulness of the data description was considered to be high.
Altogether, the quality of the produced metadata records produced and the researchers' feedback concerning data description allows the conclusion that metadata creation is a realistic activity to be performed by the researchers as long as adequate tools are provided to them. Therefore, this data curator’s workflow is regarded as a promising approach to engage researchers in RDM, through data description.
Click on the poster image to enlarge