Submission - Preservation


Karlsruhe Institute of Technology (KIT) Policies

(Rainer, Jan 26, 2013)

Large Scale Data Facility

LSDF is divided into a structured and an unstructured area.

Unstructured LSDF:

  • full backups + incremental backups

Structured LSDF:

  • full backups + incremental backups
  • regular MD5 checks
  • files as well as corresponding meta data are checked regularly (in preparation)


The mission of DARIAH is to enhance and support digitally-enabled research across the humanities and arts. DARIAH aims to develop and maintain an infrastructure in support of ICT-based research practices.

  • files will be replicated according to user requirements (in preparation)
  • regular MD5 checks

Allianz zur Forschungsdatenhaltung

German alliance aiming to preserve the most valuable and most important research data.

  • files are replicated across all 3 participating data centres
  • in each centre:
    • full backups + incremental backups
    • regular MD5 checks

CINECA Policies

(Cacciari,Fiameni, Feb 14, 2013)

Data Facility

DF is divided into different areas, the following ones are regulated by preservation policies.

User Homes :

  • full backups + incremental backups daily (tape deduplication)

Repository :

  • full backups + incremental backups (in preparation)
  • regular MD5 checks
  • local replica of some data sets (two copies) to overcome RAID parity loss
  • use of PIDs (Persistent Identifiers) based on Handle System

Structured data DB :

  • full backups + incremental backups (in preparation)

ESFRI-Project EPOS (EUDAT) + other projects

  • files are replicated across 2 data centres
  • in each centre:
    • full backups + incremental backups (in preparation)
    • MD5 checks at replication time
    • regular MD5 checks (in preparation)
    • use of PIDs (Persistent Identifiers) based on Handle System

The following archive contains the EUDAT/CINECA iRODS policies for PID management: Eudat_policies.tar.gz


iRODS Files: 1
Filename Size Date created Date modified
Eudat_policies.tar.gz 417.1 kb    2013-03-01 11:35:26       2013-03-01 12:06:43   


Max Planck Institute for Psycholinguistics (MPI-PL) Policies

(Willem Elbers, 13 June 2013)

Our policies are described in the data seal of approval:

With respect to preservation, policy 11. "The data repository ensures the integrity of the digital objects and the metadata" is most relevant.


"MD5 checksums are calculated for all objects and checked periodically. The availability of files on the file system is checked automatically daily. The availability of the archive access tools is checked automatically multiple times a day. The availability of file, web and application servers is monitored continuously. New versions of archived resources can be deposited, in which case the old versions will be moved to a version archive. In the future these old versions will also be made available to the end users but this is currently not yet the case."


Checksums (md5sum) for all our objects are stored in our PID (handle) records and used to check integrity.

The repository software maintains versions when files are edited. New versions are assigned a new PID.


Within the EUDAT Safe Replication service checksums are stored in each PID record for each replica, with a reference to the RoR (repository of record) PID record.