Skip to main content

Notice

We are in the process of rolling out a soft launch of the RDA website, which includes a new member platform. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: https://rda-login.wicketcloud.com/users/confirmation. Visitors may encounter functionality issues with group pages, navigation, missing content, broken links, etc. As you explore the new site, please provide your feedback using the UserSnap tool on the bottom right corner of each page. Thank you for your understanding and support as we work through all issues as quickly as possible. Stay updated about upcoming features and functionalities: https://www.rd-alliance.org/rda-web-platform-upcoming-features-and-functionalities/

#130601

Maggie –
Many thanks for responding to my somewhat provocative email.
1. F1 states that (meta)data are assigned a globally unique eternally persistent identifier. I had assumed the word ‘assigned’ [Latin assignare: to sign to] meant the ID remained with the digital object (as a signature does on a legal text). Of course it is possible for a DO to be destroyed or lost (hopefully provenance would allow some sort of recovery) hence for me curation (and rich metadata) involves also the provenance information: [R1.2, R1, F2]. This is close to your ‘tombstone’ resolution but with backward reference allowing reconstruction. For the purposes of scoring FAIRness we (the group) need to decide whether the ID is an attribute of the data (or metadata) in which case it cannot be separated (referential and functional integrity) from the DO or if it can stand alone maybe with the PID kernel information (itself just a subset of the rich metadata associated with a DO). If the ID is an attribute of the data then in FAIRness it is a binary – it exists or it does not (there may be concerns about universal uniqueness, eternal persistence so this would convert the binary into a double scaling – one scale for each aspect).
2. I agree with you about other information (log books etc). However, these are (for me) objects in their own right and thus have a relationship to the original DO and the relationship may be recorded [I3] in the rich metadata [F2] associated with the DO. I disagree that these are ‘only’ ancillary information: for some purposes they may be primary data (e.g. interpreted as an automated process or workflow) or metadata (providing information usable by the end-user or proxy software to assess relevance and quality of the DO to which they are linked as ancillary).
3. Your ‘greater issue’ is indeed complex. Provenance can help but it implies very detailed provenance records which point to increasingly improved versions of the metadata. This brings us full circle to the question of whether FAIRness relates to the DO or to the ability of the metadata to grant FAIRness to the DO or to the metadata itself. I feel we need clarity on this in order to find appropriate measures or scores for FAIRness.
Just to add to the fun, is FAIRness a property of the metadata describing a web service which is catalogued (e.g. in the EOSC service catalog) and which has the URL to an executable web service which accesses a DO (dataset) which is also independently described by rich metadata at a location of a computing resource (where the dataset and web service are curated) which is also described by metadata – or is FAIRness assessed over the combination of the metadata for the webservice, the DO and the computing resource?
Best
Keith
——————————————————————————–
Keith G Jeffery Consultants
Prof Keith G Jeffery
E: ***@***.***
T: +44 7768 446088
S: keithgjeffery
———————————————————————————————————————————-
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
———————————————————————————————————————————-
– Show quoted text -From: margareta.hellstrom=***@***.***-groups.org On Behalf Of Maggie.Hellstrom
Sent: 22 April 2019 12:39
To: ***@***.***-groups.org
Subject: [fair_maturity] Re: [fair_maturity] Workshop #2 Report
Keith,
thank you very much for sharing your thoughts! I very much agree with most of your points, especially those regarding the need to set up and sustainably operate the necessary infrastructures for curation and storage, both for data and the associated metadata.
However, there are two areas where my views might be somewhat differing from what you outlined:
1) Having eternally persistent identifiers (whatever “eternal” might mean 😉 IMHO only says that the identifier itself – and the metadata stored with it in the identifier registry – should be persistent, not necessarily the (data) digital object or, indeed, some of the rich metadata that was originally associated with it. If the PID rmains and hopefully resolves to a tombstone, then at least there remains some record of the now lost object.
There could be many reasons for the loss of both data and metadata, including accidents, retraction or a decision to cull old information. Sometimes the loss is a tragedy, but in other cases it’s just a natural part of the academic process. Related to FAIR, perhaps there should be a standard (or recommended) machine-actionable way (present in the PID kernel information?) to indicate that a DO is no longer accessible, and the reason for this?
2) My concept of metadata doesn’t just cover metadata stored in a cataloguing system, but also includes information that may be stored in separate documents – such as log books, measurement procedure descriptions, protocols, technical reports and the like. These should of course also be allocated PIDs and relevant cataloguing metadata to make them citable and most importantly linkable, but IMHO they shouldn’t be considered as data in their own right, but as ancillary information.
Thus, there should probably be slightly different approaches to both evaluation criteria and scoring/metrics when it comes to data and metadata. This, of course, might complicate things for both producers, curators/stewards and end users of data, but if much of the evaluation can eventually be done automatically, the collection of relevant inputs and application of tests on these might not be so much of a problem.
A much greater issue, it seems to me, concerns the temporal evolution of e.g. richness and quality of the metadata itself – bot that stored in (repository & registry) catalogues and that contained in those ancillary sources. Here, I’m lacking guidance on how to update and maintain the FAIRness scores – both for a given data object and for the repository in which it resides. (Not to mention any services that are related to the discovery, retrieval and reuse of the data!)
Cheers,
Maggie
PS I think it is sufficient to only use the [fair_maturity] mailing list (***@***.***-groups.org) as recipient on any responses to this or the other discussion messages – if people start adding individual e-mail addresses for persons who are already members of the group (such as Makx, Keith and myself) , then they will receive multiple copies. If you are uncertain, please check the member list at the bottom of the page https://rd-alliance.org/groups/fair-data-maturity-model-wg !
——————
Associate Professor Margareta Hellström
ICOS Carbon Portal staff member
***@***.***
Lund University
Department of Physical Geography and Ecosystem Science
Sölvegatan 12, SE-22362 Lund, Sweden
Phone: +46-(0)46-2229683
________________________________
________________________________
From: keith.jeffery=***@***.***-groups.org on behalf of ***@***.***
Sent: Monday, April 22, 2019 12:20
To: makxdekkers; ***@***.***-groups.org
Subject: Re: [fair_maturity] Workshop #2 Report
Makx –
I have now had time to re-read all the material from the last RDA Plenary session (unfortunately I had a clash with another meeting). I would like to make a few comments.
1. Beyond FAIR. I believe the items here are all covered by the existing FAIR principles, however not necessarily one for one. (This raises another question about the intersections of concepts in the FAIR principles – see below). To take the Beyond FAIR list from the GitHub:
a. Data Repository: a dataset may reside in a repository but not necessarily. Consider streamed data from sensors or satellite. Some colleagues get excited about repository quality or ‘trusted repositories’; from m experience the only real criteria for FAIR use by a user concern the relevance and quality of the dataset for their purpose at that time (assuming no limitations on (re-)use) – and this can only be judged from the rich metadata describing the dataset.
b. Curation and maintenance: the DCC lifecycle (and associated DMP) is useful here. Curation involves difficult decisions based on the value of the dataset and the cost of curation. The real problem is the lack of economic models that deal with (potentially) infinite time. However, the FAIR requirement for an eternally persistent ID for the dataset implies curation (if a decision is taken to assign (manually or automatically) a persistent identifier an implicit decision is taken to preserve the dataset. Curation implies a DMP, repository (possibly more than one), appropriate metadata, management responsibility, ownership responsibility, licensing, usage rules/constraints… all largely covered by what I understand by rich metadata.
c. Open Data: this is a very difficult term to define. Many conflate open and free and confuse with open access. Open data strictly concerns government data made available to citizens although the term is used more widely and is generally used meaning open data, open access, free access (possibly subject to licence conditions such as acknowledgement). I believe the open data concept is covered by the FAIR principles.
d. Data Quality: as I suggest under (a) in my experience quality is determined by the end-user in the sense of appropriate quality for the current purpose. I suggest there is no absolute measure but only relative (to the purpose). Data quality is then determined by the intersection of the user requirement and the dataset quality as described by rich metadata. This is likely to include properties/attributes such as precision and accuracy although in some sciences the reputation of the experimental team is sufficient. The equipment used for data collection and the data collection/correction/summarisation method may also be relevant.
e. Others: this is of course not yet defined. I would hope that all could be accommodated by the existing FAIR principles because they are relatively abstract; as always the ‘devil is in the detail’ and this will be specified by interpretation towards concreteness of the FAIR principles.
I am hopeful that as other potential ‘beyond FAIR’ concepts arise they will increase our understanding of the several more concrete interpretations of each of the FAIR principles.
2. Intersections in FAIR principles:
a. I believe there is some confusion in the FAIR principles concerning data and metadata. Many of the principles start with (meta)data. While I subscribe to the principle that metadata is also data (library catalog cards are metadata to a researcher finding a particular paper but data to a librarian counting papers on the human genome) I fear the FAIR principles are not clear on what should be a property of the thing referred to(data) and what should be a property of the referring thing (metadata). The obvious example is persistent identifier (I prefer UUPID): while both a dataset and the metadata describing it should have a UUPID, A1 is relevant for data (where the UUPID is an attribute in the metadata as specified in F4) but not really for metadata where the retrieval is usually by user values for metadata attributes.
b. I believe F2 and R1 are – for metadata – really the same principle and although different sets of attributes may be used for F and R there is likely to be a large intersection. For example R1.2 provenance metadata may well be highly relevant for a user finding appropriate (relevance, quality) datasets.
c. It seems to me that I2 and I3 concern aspects of I1 and could equally be I1.1 (a formal language should have its semantics defined) and I1.2 (a formal language should support qualified references, this is, for example, the advantage of RDF over XML).
I raise these concerns now because – as we define progressively the metrics for assessing FAIRness – we have to be clear on exactly what is being measured.
Thanks for your patience!
Best
Keith
——————————————————————————–
Keith G Jeffery Consultants
Prof Keith G Jeffery
E: ***@***.***
T: +44 7768 446088
S: keithgjeffery
———————————————————————————————————————————-
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
———————————————————————————————————————————-
From: mail=***@***.***-groups.org On Behalf Of makxdekkers
Sent: 17 April 2019 09:25
To: ***@***.***-groups.org
Subject: [fair_maturity] Workshop #2 Report
Dear members of the RDA FAIR Data Maturity Model Working Group,
We would like to thank you for attending the meeting of the Working Group in Philadelphia on 3 April 2019 and hope you found the meeting useful.
The report of the meeting is now available for download from the WG page on the RDA site at https://www.rd-alliance.org/workshop-2.
We are currently finalising a Google spreadsheet for your contributions to the development of the indicators for the FAIR principles following the approach presented at the meeting in Philadelphia, and we plan to share the spreadsheet with the Working Group in the coming days.
Many thanks!
Makx Dekkers
Editorial team
Maggie –
Many thanks for responding to my somewhat provocative email.
1. F1 states that (meta)data are assigned a globally unique eternally persistent identifier. I had assumed the word ‘assigned’ [Latin assignare: to sign to] meant the ID remained with the digital object (as a signature does on a legal text). Of course it is possible for a DO to be destroyed or lost (hopefully provenance would allow some sort of recovery) hence for me curation (and rich metadata) involves also the provenance information: [R1.2, R1, F2]. This is close to your ‘tombstone’ resolution but with backward reference allowing reconstruction. For the purposes of scoring FAIRness we (the group) need to decide whether the ID is an attribute of the data (or metadata) in which case it cannot be separated (referential and functional integrity) from the DO or if it can stand alone maybe with the PID kernel information (itself just a subset of the rich metadata associated with a DO). If the ID is an attribute of the data then in FAIRness it is a binary – it exists or it does not (there may be concerns about universal uniqueness, eternal persistence so this would convert the binary into a double scaling – one scale for each aspect).
2. I agree with you about other information (log books etc). However, these are (for me) objects in their own right and thus have a relationship to the original DO and the relationship may be recorded [I3] in the rich metadata [F2] associated with the DO. I disagree that these are ‘only’ ancillary information: for some purposes they may be primary data (e.g. interpreted as an automated process or workflow) or metadata (providing information usable by the end-user or proxy software to assess relevance and quality of the DO to which they are linked as ancillary).
3. Your ‘greater issue’ is indeed complex. Provenance can help but it implies very detailed provenance records which point to increasingly improved versions of the metadata. This brings us full circle to the question of whether FAIRness relates to the DO or to the ability of the metadata to grant FAIRness to the DO or to the metadata itself. I feel we need clarity on this in order to find appropriate measures or scores for FAIRness.
Just to add to the fun, is FAIRness a property of the metadata describing a web service which is catalogued (e.g. in the EOSC service catalog) and which has the URL to an executable web service which accesses a DO (dataset) which is also independently described by rich metadata at a location of a computing resource (where the dataset and web service are curated) which is also described by metadata – or is FAIRness assessed over the combination of the metadata for the webservice, the DO and the computing resource?
Best
Keith
——————————————————————————–
Keith G Jeffery Consultants
Prof Keith G Jeffery
E: ***@***.***
T: +44 7768 446088
S: keithgjeffery
———————————————————————————————————————————-
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
———————————————————————————————————————————-
From: margareta.hellstrom=***@***.***-groups.org On Behalf Of Maggie.Hellstrom
Sent: 22 April 2019 12:39
To: ***@***.***-groups.org
Subject: [fair_maturity] Re: [fair_maturity] Workshop #2 Report
Keith,
thank you very much for sharing your thoughts! I very much agree with most of your points, especially those regarding the need to set up and sustainably operate the necessary infrastructures for curation and storage, both for data and the associated metadata.
However, there are two areas where my views might be somewhat differing from what you outlined:
1) Having eternally persistent identifiers (whatever “eternal” might mean 😉 IMHO only says that the identifier itself – and the metadata stored with it in the identifier registry – should be persistent, not necessarily the (data) digital object or, indeed, some of the rich metadata that was originally associated with it. If the PID rmains and hopefully resolves to a tombstone, then at least there remains some record of the now lost object.
There could be many reasons for the loss of both data and metadata, including accidents, retraction or a decision to cull old information. Sometimes the loss is a tragedy, but in other cases it’s just a natural part of the academic process. Related to FAIR, perhaps there should be a standard (or recommended) machine-actionable way (present in the PID kernel information?) to indicate that a DO is no longer accessible, and the reason for this?
2) My concept of metadata doesn’t just cover metadata stored in a cataloguing system, but also includes information that may be stored in separate documents – such as log books, measurement procedure descriptions, protocols, technical reports and the like. These should of course also be allocated PIDs and relevant cataloguing metadata to make them citable and most importantly linkable, but IMHO they shouldn’t be considered as data in their own right, but as ancillary information.
Thus, there should probably be slightly different approaches to both evaluation criteria and scoring/metrics when it comes to data and metadata. This, of course, might complicate things for both producers, curators/stewards and end users of data, but if much of the evaluation can eventually be done automatically, the collection of relevant inputs and application of tests on these might not be so much of a problem.
A much greater issue, it seems to me, concerns the temporal evolution of e.g. richness and quality of the metadata itself – bot that stored in (repository & registry) catalogues and that contained in those ancillary sources. Here, I’m lacking guidance on how to update and maintain the FAIRness scores – both for a given data object and for the repository in which it resides. (Not to mention any services that are related to the discovery, retrieval and reuse of the data!)
Cheers,
Maggie
PS I think it is sufficient to only use the [fair_maturity] mailing list (***@***.***-groups.org) as recipient on any responses to this or the other discussion messages – if people start adding individual e-mail addresses for persons who are already members of the group (such as Makx, Keith and myself) , then they will receive multiple copies. If you are uncertain, please check the member list at the bottom of the page https://rd-alliance.org/groups/fair-data-maturity-model-wg !
——————
Associate Professor Margareta Hellström
ICOS Carbon Portal staff member
***@***.***
Lund University
Department of Physical Geography and Ecosystem Science
Sölvegatan 12, SE-22362 Lund, Sweden
Phone: +46-(0)46-2229683
________________________________
– Show quoted text -From: keith.jeffery=***@***.***-groups.org on behalf of ***@***.***
Sent: Monday, April 22, 2019 12:20
To: makxdekkers; ***@***.***-groups.org
Subject: Re: [fair_maturity] Workshop #2 Report
Makx –
I have now had time to re-read all the material from the last RDA Plenary session (unfortunately I had a clash with another meeting). I would like to make a few comments.
1. Beyond FAIR. I believe the items here are all covered by the existing FAIR principles, however not necessarily one for one. (This raises another question about the intersections of concepts in the FAIR principles – see below). To take the Beyond FAIR list from the GitHub:
a. Data Repository: a dataset may reside in a repository but not necessarily. Consider streamed data from sensors or satellite. Some colleagues get excited about repository quality or ‘trusted repositories’; from m experience the only real criteria for FAIR use by a user concern the relevance and quality of the dataset for their purpose at that time (assuming no limitations on (re-)use) – and this can only be judged from the rich metadata describing the dataset.
b. Curation and maintenance: the DCC lifecycle (and associated DMP) is useful here. Curation involves difficult decisions based on the value of the dataset and the cost of curation. The real problem is the lack of economic models that deal with (potentially) infinite time. However, the FAIR requirement for an eternally persistent ID for the dataset implies curation (if a decision is taken to assign (manually or automatically) a persistent identifier an implicit decision is taken to preserve the dataset. Curation implies a DMP, repository (possibly more than one), appropriate metadata, management responsibility, ownership responsibility, licensing, usage rules/constraints… all largely covered by what I understand by rich metadata.
c. Open Data: this is a very difficult term to define. Many conflate open and free and confuse with open access. Open data strictly concerns government data made available to citizens although the term is used more widely and is generally used meaning open data, open access, free access (possibly subject to licence conditions such as acknowledgement). I believe the open data concept is covered by the FAIR principles.
d. Data Quality: as I suggest under (a) in my experience quality is determined by the end-user in the sense of appropriate quality for the current purpose. I suggest there is no absolute measure but only relative (to the purpose). Data quality is then determined by the intersection of the user requirement and the dataset quality as described by rich metadata. This is likely to include properties/attributes such as precision and accuracy although in some sciences the reputation of the experimental team is sufficient. The equipment used for data collection and the data collection/correction/summarisation method may also be relevant.
e. Others: this is of course not yet defined. I would hope that all could be accommodated by the existing FAIR principles because they are relatively abstract; as always the ‘devil is in the detail’ and this will be specified by interpretation towards concreteness of the FAIR principles.
I am hopeful that as other potential ‘beyond FAIR’ concepts arise they will increase our understanding of the several more concrete interpretations of each of the FAIR principles.
2. Intersections in FAIR principles:
a. I believe there is some confusion in the FAIR principles concerning data and metadata. Many of the principles start with (meta)data. While I subscribe to the principle that metadata is also data (library catalog cards are metadata to a researcher finding a particular paper but data to a librarian counting papers on the human genome) I fear the FAIR principles are not clear on what should be a property of the thing referred to(data) and what should be a property of the referring thing (metadata). The obvious example is persistent identifier (I prefer UUPID): while both a dataset and the metadata describing it should have a UUPID, A1 is relevant for data (where the UUPID is an attribute in the metadata as specified in F4) but not really for metadata where the retrieval is usually by user values for metadata attributes.
b. I believe F2 and R1 are – for metadata – really the same principle and although different sets of attributes may be used for F and R there is likely to be a large intersection. For example R1.2 provenance metadata may well be highly relevant for a user finding appropriate (relevance, quality) datasets.
c. It seems to me that I2 and I3 concern aspects of I1 and could equally be I1.1 (a formal language should have its semantics defined) and I1.2 (a formal language should support qualified references, this is, for example, the advantage of RDF over XML).
I raise these concerns now because – as we define progressively the metrics for assessing FAIRness – we have to be clear on exactly what is being measured.
Thanks for your patience!
Best
Keith
——————————————————————————–
Keith G Jeffery Consultants
Prof Keith G Jeffery
E: ***@***.***
T: +44 7768 446088
S: keithgjeffery
———————————————————————————————————————————-
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
———————————————————————————————————————————-
From: mail=***@***.***-groups.org On Behalf Of makxdekkers
Sent: 17 April 2019 09:25
To: ***@***.***-groups.org
Subject: [fair_maturity] Workshop #2 Report
Dear members of the RDA FAIR Data Maturity Model Working Group,
We would like to thank you for attending the meeting of the Working Group in Philadelphia on 3 April 2019 and hope you found the meeting useful.
The report of the meeting is now available for download from the WG page on the RDA site at https://www.rd-alliance.org/workshop-2.
We are currently finalising a Google spreadsheet for your contributions to the development of the indicators for the FAIR principles following the approach presented at the meeting in Philadelphia, and we plan to share the spreadsheet with the Working Group in the coming days.
Many thanks!
Makx Dekkers
Editorial team
Maggie –
Many thanks for responding to my somewhat provocative email.
1. F1 states that (meta)data are assigned a globally unique eternally persistent identifier. I had assumed the word ‘assigned’ [Latin assignare: to sign to] meant the ID remained with the digital object (as a signature does on a legal text). Of course it is possible for a DO to be destroyed or lost (hopefully provenance would allow some sort of recovery) hence for me curation (and rich metadata) involves also the provenance information: [R1.2, R1, F2]. This is close to your ‘tombstone’ resolution but with backward reference allowing reconstruction. For the purposes of scoring FAIRness we (the group) need to decide whether the ID is an attribute of the data (or metadata) in which case it cannot be separated (referential and functional integrity) from the DO or if it can stand alone maybe with the PID kernel information (itself just a subset of the rich metadata associated with a DO). If the ID is an attribute of the data then in FAIRness it is a binary – it exists or it does not (there may be concerns about universal uniqueness, eternal persistence so this would convert the binary into a double scaling – one scale for each aspect).
2. I agree with you about other information (log books etc). However, these are (for me) objects in their own right and thus have a relationship to the original DO and the relationship may be recorded [I3] in the rich metadata [F2] associated with the DO. I disagree that these are ‘only’ ancillary information: for some purposes they may be primary data (e.g. interpreted as an automated process or workflow) or metadata (providing information usable by the end-user or proxy software to assess relevance and quality of the DO to which they are linked as ancillary).
3. Your ‘greater issue’ is indeed complex. Provenance can help but it implies very detailed provenance records which point to increasingly improved versions of the metadata. This brings us full circle to the question of whether FAIRness relates to the DO or to the ability of the metadata to grant FAIRness to the DO or to the metadata itself. I feel we need clarity on this in order to find appropriate measures or scores for FAIRness.
Just to add to the fun, is FAIRness a property of the metadata describing a web service which is catalogued (e.g. in the EOSC service catalog) and which has the URL to an executable web service which accesses a DO (dataset) which is also independently described by rich metadata at a location of a computing resource (where the dataset and web service are curated) which is also described by metadata – or is FAIRness assessed over the combination of the metadata for the webservice, the DO and the computing resource?
Best
Keith
——————————————————————————–
Keith G Jeffery Consultants
Prof Keith G Jeffery
E: ***@***.***
T: +44 7768 446088
S: keithgjeffery
———————————————————————————————————————————-
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
———————————————————————————————————————————-
From: margareta.hellstrom=***@***.***-groups.org On Behalf Of Maggie.Hellstrom
Sent: 22 April 2019 12:39
To: ***@***.***-groups.org
Subject: [fair_maturity] Re: [fair_maturity] Workshop #2 Report
Keith,
thank you very much for sharing your thoughts! I very much agree with most of your points, especially those regarding the need to set up and sustainably operate the necessary infrastructures for curation and storage, both for data and the associated metadata.
However, there are two areas where my views might be somewhat differing from what you outlined:
1) Having eternally persistent identifiers (whatever “eternal” might mean 😉 IMHO only says that the identifier itself – and the metadata stored with it in the identifier registry – should be persistent, not necessarily the (data) digital object or, indeed, some of the rich metadata that was originally associated with it. If the PID rmains and hopefully resolves to a tombstone, then at least there remains some record of the now lost object.
There could be many reasons for the loss of both data and metadata, including accidents, retraction or a decision to cull old information. Sometimes the loss is a tragedy, but in other cases it’s just a natural part of the academic process. Related to FAIR, perhaps there should be a standard (or recommended) machine-actionable way (present in the PID kernel information?) to indicate that a DO is no longer accessible, and the reason for this?
2) My concept of metadata doesn’t just cover metadata stored in a cataloguing system, but also includes information that may be stored in separate documents – such as log books, measurement procedure descriptions, protocols, technical reports and the like. These should of course also be allocated PIDs and relevant cataloguing metadata to make them citable and most importantly linkable, but IMHO they shouldn’t be considered as data in their own right, but as ancillary information.
Thus, there should probably be slightly different approaches to both evaluation criteria and scoring/metrics when it comes to data and metadata. This, of course, might complicate things for both producers, curators/stewards and end users of data, but if much of the evaluation can eventually be done automatically, the collection of relevant inputs and application of tests on these might not be so much of a problem.
A much greater issue, it seems to me, concerns the temporal evolution of e.g. richness and quality of the metadata itself – bot that stored in (repository & registry) catalogues and that contained in those ancillary sources. Here, I’m lacking guidance on how to update and maintain the FAIRness scores – both for a given data object and for the repository in which it resides. (Not to mention any services that are related to the discovery, retrieval and reuse of the data!)
Cheers,
Maggie
PS I think it is sufficient to only use the [fair_maturity] mailing list (***@***.***-groups.org) as recipient on any responses to this or the other discussion messages – if people start adding individual e-mail addresses for persons who are already members of the group (such as Makx, Keith and myself) , then they will receive multiple copies. If you are uncertain, please check the member list at the bottom of the page https://rd-alliance.org/groups/fair-data-maturity-model-wg !
——————
Associate Professor Margareta Hellström
ICOS Carbon Portal staff member
***@***.***
Lund University
Department of Physical Geography and Ecosystem Science
Sölvegatan 12, SE-22362 Lund, Sweden
Phone: +46-(0)46-2229683
________________________________
________________________________
From: keith.jeffery=***@***.***-groups.org on behalf of ***@***.***
Sent: Monday, April 22, 2019 12:20
To: makxdekkers; ***@***.***-groups.org
Subject: Re: [fair_maturity] Workshop #2 Report
Makx –
I have now had time to re-read all the material from the last RDA Plenary session (unfortunately I had a clash with another meeting). I would like to make a few comments.
1. Beyond FAIR. I believe the items here are all covered by the existing FAIR principles, however not necessarily one for one. (This raises another question about the intersections of concepts in the FAIR principles – see below). To take the Beyond FAIR list from the GitHub:
a. Data Repository: a dataset may reside in a repository but not necessarily. Consider streamed data from sensors or satellite. Some colleagues get excited about repository quality or ‘trusted repositories’; from m experience the only real criteria for FAIR use by a user concern the relevance and quality of the dataset for their purpose at that time (assuming no limitations on (re-)use) – and this can only be judged from the rich metadata describing the dataset.
b. Curation and maintenance: the DCC lifecycle (and associated DMP) is useful here. Curation involves difficult decisions based on the value of the dataset and the cost of curation. The real problem is the lack of economic models that deal with (potentially) infinite time. However, the FAIR requirement for an eternally persistent ID for the dataset implies curation (if a decision is taken to assign (manually or automatically) a persistent identifier an implicit decision is taken to preserve the dataset. Curation implies a DMP, repository (possibly more than one), appropriate metadata, management responsibility, ownership responsibility, licensing, usage rules/constraints… all largely covered by what I understand by rich metadata.
c. Open Data: this is a very difficult term to define. Many conflate open and free and confuse with open access. Open data strictly concerns government data made available to citizens although the term is used more widely and is generally used meaning open data, open access, free access (possibly subject to licence conditions such as acknowledgement). I believe the open data concept is covered by the FAIR principles.
d. Data Quality: as I suggest under (a) in my experience quality is determined by the end-user in the sense of appropriate quality for the current purpose. I suggest there is no absolute measure but only relative (to the purpose). Data quality is then determined by the intersection of the user requirement and the dataset quality as described by rich metadata. This is likely to include properties/attributes such as precision and accuracy although in some sciences the reputation of the experimental team is sufficient. The equipment used for data collection and the data collection/correction/summarisation method may also be relevant.
e. Others: this is of course not yet defined. I would hope that all could be accommodated by the existing FAIR principles because they are relatively abstract; as always the ‘devil is in the detail’ and this will be specified by interpretation towards concreteness of the FAIR principles.
I am hopeful that as other potential ‘beyond FAIR’ concepts arise they will increase our understanding of the several more concrete interpretations of each of the FAIR principles.
2. Intersections in FAIR principles:
a. I believe there is some confusion in the FAIR principles concerning data and metadata. Many of the principles start with (meta)data. While I subscribe to the principle that metadata is also data (library catalog cards are metadata to a researcher finding a particular paper but data to a librarian counting papers on the human genome) I fear the FAIR principles are not clear on what should be a property of the thing referred to(data) and what should be a property of the referring thing (metadata). The obvious example is persistent identifier (I prefer UUPID): while both a dataset and the metadata describing it should have a UUPID, A1 is relevant for data (where the UUPID is an attribute in the metadata as specified in F4) but not really for metadata where the retrieval is usually by user values for metadata attributes.
b. I believe F2 and R1 are – for metadata – really the same principle and although different sets of attributes may be used for F and R there is likely to be a large intersection. For example R1.2 provenance metadata may well be highly relevant for a user finding appropriate (relevance, quality) datasets.
c. It seems to me that I2 and I3 concern aspects of I1 and could equally be I1.1 (a formal language should have its semantics defined) and I1.2 (a formal language should support qualified references, this is, for example, the advantage of RDF over XML).
I raise these concerns now because – as we define progressively the metrics for assessing FAIRness – we have to be clear on exactly what is being measured.
Thanks for your patience!
Best
Keith
——————————————————————————–
Keith G Jeffery Consultants
Prof Keith G Jeffery
E: ***@***.***
T: +44 7768 446088
S: keithgjeffery
———————————————————————————————————————————-
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
———————————————————————————————————————————-
– Show quoted text -From: mail=***@***.***-groups.org On Behalf Of makxdekkers
Sent: 17 April 2019 09:25
To: ***@***.***-groups.org
Subject: [fair_maturity] Workshop #2 Report
Dear members of the RDA FAIR Data Maturity Model Working Group,
We would like to thank you for attending the meeting of the Working Group in Philadelphia on 3 April 2019 and hope you found the meeting useful.
The report of the meeting is now available for download from the WG page on the RDA site at https://www.rd-alliance.org/workshop-2.
We are currently finalising a Google spreadsheet for your contributions to the development of the indicators for the FAIR principles following the approach presented at the meeting in Philadelphia, and we plan to share the spreadsheet with the Working Group in the coming days.
Many thanks!
Makx Dekkers
Editorial team