DCU-DANS Interoperability Project

The Digital Curation Unit - IMIS, Athena Research Center
The Digital Curation Unit (DCU) was launched at "Athena" Research and Innovation Centre in Information, Communication and Knowledge Technologies in January 2007 with the mission of conducting research, developing technologies and applications, providing services and training, and acting as a national focus point in the field of digital curation. Since June 2009 the DCU is a department of the Institute for the Management of Information Systems of the “Athena” Research centre. 
Digital curation encompasses a set of activities aiming at the production of high quality, dependable digital assets; their organisation, archiving and long-term preservation; and the generation of added value from digital assets by means of resource-based knowledge elicitation. To ensure the adequate capture of the context of digital resources and their subsequent creative and effective use, the DCU adopts a multidisciplinary approach which considers the full lifecycle of digital assets, such as records, digital surrogates and scholarly/scientific datasets. This approach favours a multi-faceted analysis of issues and combined use of methods, techniques and tools from the fields of informatics (especially, areas such as knowledge representation and management, knowledge extraction, ontology engineering, multimedia data management, Web technologies, digital libraries, workflow management); management and decision sciences (especially, areas such as workflow analysis, reliability analysis and cost-benefit analysis); library science and archives management; material culture studies, museology and communications; epistemology; and law.
Project Proposal
1. Goal
The proposed project involves a de-duplication service for humanities related content. This de-duplication service will focus on Authors and Datasets. It will take into account available information such as subject terms, descriptions, possibly geo-spatial information in order to identify first authors and secondly datasets. The datasets that are going to be used is those of DANS (NARCIS and possibly EASY) and the focus will be on authors mostly because there is probably little de-duplication on the dataset level (in the specific content). The service will work on XML encoded data and will be able to “understand” metadata schemas that are found in the humanities domain.
2. Service description 
The service that will be developed will require as input XML records which will be harvested automatically by an OAI-PMH 2.0 harvester. The service will then report back findings through a Web based UI and a REST interface. Furthermore, an export in RDF will also be provided that will link Authors together (through their native Identifiers found in OAI-PMH). The service will be able to work with specific schemas and the initial set of schemas that will be supported are: oai_dc, MODS, CARARE, nl_didl.