Use Cases

This page lists the use cases the WG considers to be relevant for its scope. The use cases set the frame for further work on defining types, the API and other deliverables. The WG outcomes will most likely not enable all use cases in all their details, but focus on specific parts or generic building blocks.

The list of use cases has been stable for some time now. Additions can be made, though the current efforts must progress within a defined scope to come to useful outcomes at M18.

The use cases are much broader than the scope of the WG, making the list valuable to a potential IG. Obviously, all use cases prerequisite that the objects in question (data, metadata, ...) bear PIDs. An important metaphor regarding the scope of this WG is the mail envelope: if access always returns the data in an envelope, what must be written on the envelope to enable the use cases without opening the envelope and looking at the data?

You can find all PDFs for the use case documents at the bottom of the page and also a zip file containing editable office documents.

Data replication (view use case)

A master copy of data is replicated at several locations. The purpose of replication is providing safe backups. The replicas are not accessed unless the original copy is lost and must be restored. PID types can help by providing the means to record alternate locations and to distinguish master from copy objects.

Subscribers: EUDAT

Data access load leveling (view use case)

A master copy of data is distributed to several other locations. Users can access the data from any of these locations, taking load off of the original location. PIDs types can help by providing the means to record alternate locations with emphasis on efficient lookup.

This use case is an extension of the data replication use case.

Subscribers: ESGF

Format obsolescence audit (view use case)

Over time, file formats become obsolete or unavailable. In order to avoid loss of access to needed content in these formats, the content must be checked and resources at an obsolescence risk must be individually identified. PIDs types can help by providing the necessary format information.

Subscribers: DataConservancy

Versioning (view use case)

Data is stored in a repository and receives a PID. At a later point in time, a new version of the same data is available. When it is stored in the repository, questions are what to do with the old copy and its PID, whether to assign a new PID and if so, how to manage/interlink the two PIDs. PID types may help to establish the link between old and new version PIDs or specify which policies apply in this process.

Subscribers: DKRZ/WDCC, DARIAH/CINES, DWD

Composite objects (view use case)

No consensus exists regarding the level of granularity at which PIDs are assigned to data objects. Different usage scenarios require different granularities, and thus PIDs must become hierarchically structurable. If both individual objects and the larger composite receive PIDs, then these implicit relations should be discoverable for humans but also for machine agents that for example copy or analyze objects.

Subscribers: DKRZ/WDCC, DARIAH/GWDG, DARIAH/sldr.org

Managing data objects and metadata objects in combination (view use case)

If there are distinct data and metadata objects and these are deposited together in a repository, both of them may receive an individual PID. PID types may help by distinguishing data from metadata objects and connect them with each other.

Subscribers: DKRZ/WDCC

Managing object access permissions (view use case)

If a data object bears a PID, an access control system component may want to be able to quickly decide on a user's access request by looking at the envelope and evaluating criteria such as time-based access or the total number of times the object has already been accessed.

Subscribers: DataNet Federation Consortium

Manage write control (view use case)

In a repository that organizes objects in through collections, policies exist that demand that users do not modify such collections arbitrarily. For instance, adding new collection elements must be restricted, and for this purpose, a ticketing system is used that is ultimately based on PIDs.

Subscribers:

Custom Data Citation (view use case)

If an author, editor, or other content creator wishes to succintly reference multiple resources with existing identifiers in a single citation, it may be helpful to have a mechanism to create an aggregation of these identifiers that can be reference via a single PID.

Subscribers:

Modifying data (view use case)

In scientific data infrastructures, individual objects are frequently modified and re-distributed for reasons such as error correction or regular recomputation. Such situations should however be accountable to meet user expectations, ease maintenance and ensure a fundamental level of service quality. Replacing or modifying objects should be properly managed.

Subscribers:

Provenance Tracing (view use case)

In scientific data infrastructures, individual objects are frequently modified and re-distributed for reasons such as error correction or regular recomputation. Such situations should however be accountable to meet user expectations, ease maintenance and ensure a fundamental level of service quality. Replacing or modifying objects should be properly managed.

Subscribers: