The EPISA project (Entity and Property Inference for Semantic Archives) is part of the ongoing renewal of DGLAB’s existing data infrastructure and aims, amongst other goals, to develop a prototype for an open-source knowledge graph platform adopting a new data model for archival description. Content integration is among its key features, as we intend to create a flexible data model that can both interoperate with other information systems and accommodate information regarding cultural resources other than archival documents.
The Portuguese National Archives are managed by DGLAB (Direção-Geral do Livro dos Arquivos e das Bibliotecas), a public administration body responsible for the management of several information systems that support its mission to safeguard, enhance and promote governmental and public records, as well as other historical documents in its custody. It holds the most relevant cultural heritage collection, largely digitized and accessed both by history researchers and by laypeople from all the Portuguese-speaking countries and beyond.
About 3.5 million metadata records are currently held on a system designed 20 years ago, according to the standards by the International Council of Archives (ICA). These records are mainly composed by textual descriptions of the context and the contents of the documents, but large amounts of born-digital information are growing in the system.
Due to the complexity of the paradigm shift involved in the representation of archival information on a linked data model, this project is also devoted to finding ways to guarantee the effective migration of contents stored according to ICA standards to an ontology-based model, requiring both the use of existing cross-walks and the inference of the new relations with semi-automated methods. The need for a new generation of description tools that includes libraries, archives and museums – more fine grained, more flexible and specially more machine-actionable – led to the choice of the CIDOC-CRM (ISO 21127:2014) as root ontology. The role of DGLAB as a large archival institution (it integrates the headquarters in Lisbon and the majority of the district archives) and also as a regulator for the state, municipal and private archives, is a guarantee of the impact of the project results. We anticipate that the proposed change in cultural heritage metadata will give users a better knowledge of the repository and improved tools for more flexible and richer retrieval, as well as a stronger presence in aggregators, both in cultural heritage and elsewhere, as we ensure compliance to the FAIR data principles.
EPISA, financed by the Portuguese Foundation for Science and Technology (FCT), is a collaboration between INESC TEC - Institute for Systems and Computer Engineering, Technology and Science, as principal contractor, and DGLAB and the University of Évora, as participating institutions.
Maria José de Almeida
INESC TEC, FEUP email@example.com
Click on the poster image to enlarge
If you want to hear what we have to say, click here, and you can also meet our team!