INTRODUCTORY NOTE FOR THE GROUP BY Gail & Enrique
Hi everbody, we attach the 1st Draft of Prnciple 5 (Metadata).
We (Gail & Enrique) got somehow stuck on understanding what the consensus of our group is regarding What the purpose of rights metadata is with respect to research data?
Possible answers could be:
- to clarify ownership of the data;
- to clarify what legal or policy mechanisms govern the use of the data (e.g., copyright, moral rights, database rights, contractual obligations, etc.)
- to document the presence of a license governing reuse (e.g., CC-BY 2.0 or GNUxxx or …?)
- to explicate the terms and conditions provided to users via that license (e.g., the content of the license);
- Something else?
- All of the above?
Our impression is that the original idea was mainly twofold:
1st, to ensure that metadata are always available as something inextricably linked to data and that cannot be limited by protecting the metadata as something that can also be IPRd (so that re-use of data is ensured without any need of further human-human interaction, or even machine-human, or sometimes even the access to the data, if it is not text but more complex set of hard sciences data for which you need metadata carefully designed -for example including references to the software needed to make sense of the data); and
2d -or 1st too? to explicate vis machine-readable access the terms and conditions provided to users via specific licenses (e.g., the content of the license), specially in the case of public domain and/or open licenses
Our document points readers to examples of metadata standards that contain some form of rights metadata, but the robustness/detail of the rights info provided varies from one standard to the next. For example, Dublin Core’s dc.rights element generally contains a copyright notice or the type of CC license associated with the digital resource. The latter may not provide a potential user sufficient info to contact the rights holder for additional permissions should the proposed use not conform with the license conditions. A copyright notice does not, in and of itself, tell a potential user what jurisdiction the owner resides in and which country’s copyright laws) applies – creating uncertainty around the presence of moral rights. Also, a public domain resource will not have a copyright notice – how will a user know if the resource is PD, or is copyrighted but no notice is provided?
In sum, if the purpose of rights metadata is to remove legal uncertainty (and thus any barriers to use), our group needs to come to consensus on what types of legal uncertainty are barriers to sharing and reusing research data and recommend metadata approaches to address those types. This is the approach the PREMIS group took in devising the rights metadata elements in that standard and, in my opinion, PREMIS represents one of the more thoughtful and robust approaches to conveying rights information to users (in this case, preservationists).
But we also included more abstract principles that could be used for harmonizing approaches to metadata use for specifying those rights (points 1 to 5).
However other related issues have all been placed at the end in brackets to check if the group wants to address them or not. They are issues such as a) software metadata (point 6), b) alternatives to metadata but nevertheless machine-readable techs to express rights (point 7), or c) the legal status of metadata (for me it is not so clear that all metadata is fact and non-protectable) (point 8)
By the way, please aslos notice that after drafting the guidelines and re-reading the principle itself we suggest a different wording (the group had originally agreed to try not to revisit the text of the principles themselves) but we opted for offering the alternative draft of Principle 5.
END OF INTRODUCTORY NOTE
Improve metadata to enhance legal interoperability. The metadata for any publicly available dataset should include all relevant information on data ownership (if any), rights, licenses, and restrictions, ideally in machine-readable form using available standards. All metadata for a dataset should be freely accessible with no legal restrictions imposed on the use of the metadata.
[When drafting the guidelines, Gail and Enrique realized that a different wording of Principle 5 itself might be more to the point:
Improve metadata to enhance legal interoperability. The metadata for any publicly available dataset should include all information necessary to understand the legal ownership of the data and any terms and conditions governing its access and reuse. To ensure interoperability across disparate online systems, rights metadata accompanying datasets should be made available in machine-readable form using available standards. Additionally, all metadata describing a dataset should be freely accessible with no legal restrictions imposed on its reuse.]
Metadata -- – the structured descriptions of data sets and data services that facilitate their discovery, assessment, inventory, and use -- are the principal mechanism through which Principle 4 on transparency and certainty can be achieved. Today there is a wide diversity of metadata standards and conventions in use to describe digital information resources. Many of these standards are developed by particular communities of practice and apply to discipline- or genre- specific documents and/or datasets. For example, AVM - Astronomy Visualization Metadata Scheme facilitates cross-searching of astronomical imagery collections rendered from telescopes. The DDI - Data Documentation Initiative is an international standard for describing data from the social, behavioral, and economic sciences. The FGDC/CSDGM - Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata is widely used across disciplines to describe data containing explicit geospatial references. (Digital Curation Centre, 2015, Disciplinary Metadata, http://www.dcc.ac.uk//resources/metadata-standards).
The majority of established metadata standards in use today do not include elements that convey the legal information needed by users to clearly understand their rights and responsibilities in reusing data appropriately. Rather, information about ownership rights and usage terms and conditions are only loosely coupled to the dataset files in the form of copyright notices or as open licenses posted on the dataset landing page. Legal information governing data reuse is therefore easily disconnected from the data and lost, leaving potential users concerned about violating the law. Removing legal uncertainty surrounding data re-use requires consistent and predictable rights information that remains associated with the data assets being used. Rights metadata helps both humans and machines confidently reuse data assets without concerns of infringement or breach of license.
Models for Presenting Legal Information in Metadata
A number of widely-used metadata standards do contain one or more “rights” elements and their sponsoring communities provide guidelines on how to populate rights fields most effectively. For example, The Data Documentation Initiative used with social science data contains the field 126.96.36.199 [elaborate]. The Dublin Core metadata scheme commonly used with digital library and Web resources also includes a rights field [elaborate] And the PREMIS metadata standard applied in digital preservation to convey administrative information necessary to steward digital resources over time includes the blahblahblah element, with recommendations to populate it in xxxx way. Additionally, the Data Catalog Vocabulary (DCAT), Description of a Project (DOAP),, myExperiment (“myExperiment base Ontology,” rdf.myexperiment.org. [Online]) incorporates a rights field. The Australian National Data Service (ANDS), for instance, has utilized this additional information to allow filtering of data by license. All of these models represent positive steps towards recognition of this issue.
In any case, these examples show that since there are multiple variations of public domain interpretation, as well as the multiple open licenses and much more of the ordinary private licenses, so it is almost impossible to pretend to have a full sort of list of metadata addressing openness or restrictions.
So, some principles can se asserted:
1.- The temptation to “create” a single “open science” or rather “open access” metadata format should be resisted since there are different definitions of these terms and they are concepts too broad to have everybody agree as to what would be the real implications.
2.- The most easy way to approach seems to be the difference between access and use/re-use; understanding by access visualization or the possibility to read the text where the data is embedded, and by use/re-use all other copyrightable rights (anything beyond reading in the case of classic scientific literature, paper or digitized, or viewing in the case of other formats in which data is collected, e.g., images, maps, mathematical expressions etc., such as displaying, navigating, zooming in/out, overlaying viewable data sets or display legend information and any relevant content of metadata; downloading; and transforming or enabling data sets to be transformed in order to achieve –even providing open source software when needed).
3..- Standardization of these types of metadata for the initial access (read/visualize) in classic scientific literature (paper or digitized) seems to point to a potential universal indication (with or without time frames added under the YYYY/DD/MM format) as a minimum ideal reflecting open access in the metadata; an equivalent should be developed for other formats that imply visualization instead of reading.
4.- Re-use rights could be farther referenced a
indication, through which
a) classic open licenses (including the CC Public Domain Mark) could be expressed in a machine readable way although in many cases it might be unavoidable to use human-readable formats [ in which case the metadata should point to a human-readable license (although this could also be only a first step, then linking from the human-readable license to the legal license document text or a structured/tagged machine-readable license)];
b) real public domain and other waivers could also be referred to via ad hoc harmonized indicators equivalent to
c) duplication or multiplication of additional similar indicators could be used to protect components of a content item (e.g., figures in a journal article or book chapter) by adding more
URL indicators specifically applied to
the component; and
d) an additional machine-readable
URL metadata could be provided when any part of the article is licensed under a different license to that governing the article as a whole (e.g. images, graphics, dat are-usable under certain licensed software…).
5.- Fully limited (by paywalls or simply because it is a non-shared or limited-shared data resource) should ideally have a common harmonized (or something similar) indication addressing directly to the specific conditions of the particular license via hyperlinks instead of complicating and elongating metadata expressions by having a register of the most usual private copyright licenses (although the indicators can be incorporated directly into an existing schema or use a standalone schema and reference it via namespace). So metadata has its limitations: metadata help overcome the problems of finding and interpreting data. This simple single metadata expression could facilitate the incorporation through metadata of the realities of the current situation—data collections protected by various families of often-incompatible copyright licenses.
[6.- In case we want also to address software openness expressed through metadata:
A similar approach can be applied not only to data but also to software, whether free software (it respects “the four liberties”), open software (it respects the 10 criteria of the Open Source Initiative) or FOSS/FLOSS. Good metadata can be always important when giving away new components to a large community: it helps others to use your work and increases its longevity by giving licenses to the documentation (GNU FDL, MPL, Apache, BSD, MIT…) and to web sites specially if persistent identifiers are used [URIs, sometimes payed ones e.g. digital object identifier (DOI)]. However, even open access sources might also get confusing due to the reciprocity condition that might weaken, limit or even suppress copyleft. In the case of software pure non-licensed waivers are usually not admitted, so reference to licenses are more useful than in the case of data. The white lists of FOSS/FLOSS are, though, more streamlined than the multiple data and publications open and public licenses (see licenses made available by the Free Software Foundation or listed as “open source” by the Open Source Initiative) but more and more infrastructures themselves create new ones. In any case, a simple line of the metadata with the appropriate link to the repository platform could suffice. This simple use of URIs should be specially promoted when the reproducibility (or re-usability in general) of the data can only be done using particular software: the metadata of the data repository (and of the scientific article too) should lead as easily as possible to the FOSS/FLOSS if it exists (see e.g. the CERN-run Zenodo).
And of course, software are also protected by “patents” depending on national jurisdictions.
For additional software related platforms see Archimer, arXiv, DataCite, DANS, DOAJ, DRYAD, Edinburgh Research Archive, EGI Applications Database, Episciences, EUDAT, exec&share, GBIF, GitHub, Google code, HAL, IPOL, Journal of Open Research Software (JORS), nanoHUB, OpenAIRE, OpenDOAR, OpenEdition, ORBi (U. Liège), Projet PLUME, RE3DATA, RECOLECTA, Research Papers in Economics, ResearchCompendia, RunMyCode, Software Sustainability Institute, SourceForge, swMath, zbMath, Zenodo and many others]
[7.- In case we want to address the issue of existing machine-readable alternatives to metadata expression of public domain, waivers or licensed rights limiting access to the data:
Metadata is not the only tool usable, there are also Rights Expression Languages (REL) that aim to encode restrictions on the use of content and that provide formal, machine-readable expressions of copyright usually through creating a controlled vocabulary of verbs standing in as restricted actions -see Open Digital Rights Language (ODRL), MPEG-21,METSRights…- but their interoperability capabilities is almost zero; so metadata are clearly the best option] [until apps, the additional alternative, are developed]
[8.-ADDITIONAL CAVEAT: DO WE ADDRESS AS AN INTRODUCTION THE ISSUE OF WHETHER ALL METADATA IS FACT AND, THUS, NON PROTECTABLE?
IF SO, AN INITIAL DRAFT COULD BE THE FOLLOWING:
Legal status of metadata itself
In principle metadata is about facts; so it is not protectable. (i.e. information about author, editor, title, publisher, publishing date and place, identification of the main publication, number of pages, etc., as well as further information such as format information, identifiers (ISBN, LCCN, OCLCNumbers…), information about funding, the storage media, size, administrative data (e.g. last change of the data set), relevant links, indexes, links to digitized extracts of a text (indexes, registers, tables of quoted literature), address and other contact details about the author(s), covers, abstracts, reviews, summaries, indices, subject indexes, notations, user generated tags, signatures., and … information about the copyright and license status. [Please, notice that some of these “complementary materials” can be protectable in principle. This applies especially to some data annotation or expression literary works like summaries, abstracts, reviews or graphical creations like book covers. Even classification systems can be protected (if they represent an individual/original intellectual achievement)].
So, among these complimentary data, certainly information about the public domain or copyright and license status of the metadata itself can be included.
But in general, and certainly metadata information on public domain or copyright and license or waiver status, will not be protected under copyright law. This data is already public domain so the status of protection needs not to be changed in order to grant unrestricted freedoms to use. That means, a) that there is no need to license or waive to release metadata into the public domain; and b) that that implying more restrictive license conditions for the use of metadata should always be considered copyfraud (i.e., an improper practice since it would imply claiming copyrights on or to restrict the use of actually non-protected intellectual goods). ]
El texto de este correo es confidencial y exclusivamente está dirigido a su destinatario. Si se ha enviado a una dirección errónea rogamos elimine el mismo y, en su caso, los documentos adjuntos, y nos lo comunique urgentemente. This message is intented only for the use of the addresse and contain confidential information. If you are not the intented recipient, dissemination of this documentation is prohibited. If you have received this communication in error, please, erase all copies of the message and its attachments and notify us immediately.
Antes de imprimir este correo electrónico, piense bien si es necesario hacerlo: El medioambiente es cosa de todos.
RE: principle 5 Metadata
You are here
INTRODUCTORY NOTE FOR THE GROUP BY Gail & Enrique