WGDC Pilots

Pilots: Adoption of the WGDC Recommendations

The following table shows a list of pilots implementing the RDA WGDC recommendations. Further details can be found by clicking on the name in the table below.

Name Data Type Status Notes

WG Pilots

CSV-Reference CSV / SQL reference running Reference implementation
Natural History Museum London RDBMS operational finished  
TIMBUS RDBMS research finished Sensor data
XML-Reference XML research finished eXist-DB
DEXHELPP CSV/RDBMS research running Social security data
Git-Reference ASCII reference running Reference implementation
VAMDC SQL/NoSQL/ASCII/XML deployment running Distributed data center
CBMI@wustl RDBMS deployment starting integration into i2b2
CCCA NetCDF deployment finished climate scenarios data
ACDH RDBMS, LoD deployment starting thesaurus
ARGO NetCDF deployment planned ODIP-II
BCO-DMO CSV deployment planned  
ENVRIplus   deployment running  
Ocean Networks Canada Data streams deployment starting Oceanographic data
... CSV, RDBMS deployment planning Conceptual evaluation, seeking funding

















Short Template

- Pilot name:
- Contact person:
- Type: research pilot / reference implementation / operational system
- Status: finished / active / starting / planned
- Type of data: (RDBMS, XML, CSV, file-based, other)
- Dynamics: (very high-frequency (microsecs) / very frequent (minutes) / frequent (daily/hourly) / sometimes (every other month) / rarely /none)
- Domain:
- Short description:
- Solution / approach:
- Timeline:
- Supplementary material: slides, reports, screenshots, papers, SW, ...

Details Template

RDA Data Citation Recommendations and their Application in the CSV Reference Implementation

A.    Preparing the Data and the Query Store

  • R1 – Data Versioning: For retrieving earlier states of data sets the data needs to be versioned.
  • R2 – Timestamping: Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with a timestamp.
  • R3 – Query Store: Provide means to store the queries used to select data and associated metadata.

B.    Persistently Identify Specific Data sets
         When a data set should be persisted, the following steps need to be applied:

  • R4 – Query Uniqueness: Re-write the query to a normalised form so that identical queries can be detected. Compute a checksum of the normalized query to efficiently detect identical queries.
  • R5 – Stable Sorting: Ensure an unambiguous sorting of the records in the data set.
  • R6 – Result Set Verification: Compute a checksum of the query result set to enable verification of the correctness of a result upon re-execution.
  • R7 – Query Timestamping: Assign a timestamp to the query either based on the last update to the entire database or the last update to the selection of data affected by the query or the query execution time. This allows retrieving the data as it existed at query time.
  • R8 – Query PID: Assign a new PID to the query if either the query is new or if the result set returned from an earlier identical query is different due to changes in the data. Otherwise, return the existing PID.
  • R9 – Store Query: Store query and metadata (e.g. PID, original and normalised query, query & result set checksum, timestamp, superset PID,  data set description and other) in the query store.
  • R10 – Citation Text: Provide a recommended citation text and the PID to the user.

C.    Upon Request of a PID

  • R11 – Landing Page: PIDs should resolve to a human readable landing page of the data set, which provides metadata including a link to the superset (PID of the data source) and citation text snippet.
  • R12 – Machine Actionability: the landing page should be machine-actionable and allow retrieving the data set by re-executing the timestamped query.

D.    Upon Modifications to the Data Infrastructure

  • R13 – Technology Migration: When data is migrated to a new representation (e.g. new database system, a new schema or a completely different technology), the queries and associated checksums need to be migrated.
  • R14 – Migration Verification: Successful query migration should be verified by ensuring that queries can be re-executed correctly.