This document describes mechanisms where interoperability ofdata is ensured with the use of standards. The standards wecovered are both domain related, the archival standards in XMLformats such as EAD, EAC-CPF and EAG, and transversalstandards, whose use is recommended in the context of any digitalproject, in particular the ISO standards for the representation oflanguage, script and countries.Interoperability of archival descriptions expressed in EAD is madepossible with the specification of a specific EAD profile for EHRI.This profile is built and maintained using the TEI-ODD framework,which is explained of the first section of the report.Interoperability and reusability of EHRI resources is also ensuredwith the design of more consistent URLs, composed withstandardised methods and using ISO reference codes. This designhas to be seen as a first step through a persistent identifier system.The work initiated in WP11 and presented in this document will becontinued, enhanced and developed by other EHRI work packages,WP7 Virtual Access to EHRI Virtual Observatory, WP10 ResourceIdentification and Integration Workflows and WP13 Research DataInfrastructures for Holocaust Material.
In this paper we would like to present some ideas on the use of the archival standards in various contexts that exemplify the complexity of such standards and provide users with innovative ways to handle EAD content. Our main idea is that researchers, Cultural heritage institutions, archival portals and standards maintenance bodies could greatly benefit from a multiscale modelling of archival data, but also from multiscale representations and documentations. A first step is on the way to being cleared in the domain of the management of heterogeneous archival sources in one single environment, namely a federated portal, like in EHRI. We built a methodology based on a specification and customisation method inspired from the long lasting experience of the Text Encoding Initiative (TEI) community. In the TEI framework, one has the possibility of defining project-specific sub-sets or extensions of the TEI guidelines while maintaining both the technical (XML schemas) and editorial (documentation) specification within a single framework. Using the same framework for EAD data allows us to express precise content-oriented rules combined with some interesting possibilities of integrating the human readable documentation in the validation process. Umanistica Digitale, No 4 (2019): Data Sharing, Holocaust Documentation and the Digital Humanities: Best Practices, Case Studies and Benefits
International audience; One of the funded project proposals under DARIAH’s Open Humanities call 2015 was “Open History: Sustainable digital publishing of archival catalogues of twentieth-century history archives”. Based on the experiences of the Collaborative EuropeaN Digital Archival Research Infrastructure (CENDARI) and the European Holocaust Research Infrastructure (EHRI), the main goal of the “Open History” project was to enhance the dialogue between (meta-)data providers and research infrastructures. Integrating archival descriptions – when they were already available – held at a wide variety of twentieth-century history archives (from classic archives to memorial sites, libraries and private archives) into research infrastructures has proven to be a major challenge, which could not be done without some degree of limited to extensive pre-processing or other preparatory work. The “Open History” project organized two workshops and developed two tools: an easily accessible and general article on why the practice of standardization and sharing is important and how this can be achieved; and a model which provides checklists for self-analyses of archival institutions. The text that follows is the article we have developed. It intentionally remains at a general level, without much jargon, so that it can be easily read by those who are non-archivists or non-IT. Hence, we hope it will be easy to understand for both those who are describing the sources at various archives (with or without IT or archival sciences degrees), as well as decision-makers (directors and advisory boards) who wish to understand the benefits of investing in standardization and sharing of data. It is important to note is that this text is a first step, not a static, final result. Not all aspects about standardization and publication of (meta-)data are discussed, nor are updates or feedback mechanisms for annotations and comments discussed. The idea is that this text can be used in full or in part and that it will include further chapters and section updates as time goes by and as other communities begin using it. Some archives will read through much of these and see confirmation of what they have already been implementing; others – especially the smaller institutions, such as private memory institutions – will find this a low-key and hands-on introduction to help them in their efforts.
This article tackles the issue of integrating heterogeneous archival sources in one single data repository, namely the European Holocaust Research Infrastructure (EHRI) portal, whose aim is to support Holocaust research by providing online access to information about dispersed sources relating to the Holocaust (http://portal.ehri-project.eu). In this case, the problem at hand is to combine data coming from a network of archives in order to create an interoperable data space which can be used to search for, retrieve and disseminate content in the context of archival-based research. The scholarly purpose has specific consequences on our task. It assumes that the information made available to the researcher is as close as possible to the originating source in order to guarantee that the ensuing analysis can be deemed reliable. In the EHRI network of archives, as already observed in the case of the EU Cendari project, one cannot but face heterogeneity. The EHRI portal brings together descriptions from more than 1900 institutions. Each archive comes with a whole range of idiosyncrasies corresponding to the way it has been set up and evolved over time. Cataloging practices may also differ. Even the degree of digitization may range from the absence of a digital catalogue to the provision of a full-fledged online catalogue with all the necessary APIs for anyone to query and extract content. There is indeed a contrast here with the global endeavour at the international level to develop and promote standards for the description of archival content as a whole. Nonetheless, in a project like EHRI, standards should play a central role. They are necessary for many tasks related to the integration and exploitation of the aggregated content, namely: ● Being able to compare the content of the various sources, thus being able to develop quality-checking processes; ● Defining of an integrated repository infrastructure where the content of the various archival sources can be reliably hosted; ● Querying and re-using content in a seamless way; ● Deploying tools that have been developed independently of the specificities of the information sources, for instance in order to visualise or mine the resulting pool of information. The central aspect of the work described in this paper is the assessment of the role of the EAD (Encoded Archival Description) standard as the basis for achieving the tasks described above. We have worked out how we could develop a real strategy of defining specific customization of EAD that could be used at various stages of the process of integrating heterogeneous sources. While doing so, we have developed a methodology based on a specification and customization method inspired from the extensive experience of the Text Encoding Initiative (TEI) community. In the TEI framework, as we show in section 1, one has the possibility to model specific subsets or extensions of the TEI guidelines while maintaining both the technical (XML schemas) and editorial (documentation) content within a single framework. This work has led us quite far in anticipating that the method we have developed may be of a wider interest within similar environments, but also, as we believe, for the future maintenance of the EAD standard. Finally this work, successfully tested and implemented in the framework of EHRI [Riondet 2017], can be seen as part of the wider endeavour of European research infrastructures in the humanities such as CLARIN and DARIAH to provide support for researchers to integrate the use of standards in their scholarly practices. This is the reason why the general workflow studied here has been introduced as a use case in the umbrella infrastructure project PARTHENOS which aims, among other things, at disseminating information and resources about methodological and technical standards in the humanities.