Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Write it Down! Fostering Responsible Reuse of Cultural Heritage Data with Interoperable Dataset Descriptions

Slides for a lightning talk at AI4LAM's annual conference Fantastic Futures, December 3 – 5, 2025, British Library, London
Authors: Alkemade, Henk; Candela, Gustavo; Claeyssens, Steven; Eskevich, Maria; Freire, Nuno; Gabriels, Nele; Irollo, Alba; +5 Authors

Write it Down! Fostering Responsible Reuse of Cultural Heritage Data with Interoperable Dataset Descriptions

Abstract

Abstract Cultural heritage institutions have seen a surge in the creation of datasets ready for computational use, while researchers increasingly experiment with datasets through computational processing and AI-assisted methods. For both groups, issues of transparency have sparked interest in developing documentation practices cutting across the artificial intelligence/machine learning (AI/ML) and the digital cultural heritage (DCH) sector, aiming to provide better information on e.g., the purpose, composition, reusability, collection processes and provenance, or societal biases reflected in datasets. The publication of Datasheet for Datasets (Gebru et al., 2021) and the Collections as Data movement (Padilla et al. 2023) have sparked the definition of guidelines for dataset creators and publishers who want to follow FAIR and CARE principles and make it easier for one to reuse their data in a responsible, well-informed manner. Gathering CH professionals, technical experts and humanities scholars from the Europeana Research and EuropeanaTech communities, the Datasheets for Digital Cultural Heritage working group has adapted existing ML documentation approaches to the DCH case. As a first outcome, a template (Alkemade et al, 2023) has sought to address the complexities of DCH datasets, shaped by layered curatorial decisions, often subject to evolving and non-linear trajectories. In the spirit of the common European data space for cultural heritage (2025), which is being deployed under the stewardship of the Europeana Initiative, the working group has then supported professionals interested in applying the template in their institutional context (see for example Lehmann et al., 2024) and fostered exchanges with other initiatives emerging at the European level and exploring suitable ways to describe datasets. One key initiative in this regard concerns the proposal for Data-Envelopes for Cultural Heritage (Luthra et al., 2024), which has focused specifically on providing machine-readable descriptions of datasets, especially considering the W3C Data Catalogue Vocabulary (DCAT) that is used in many data portals. The goal of this collaboration is both to validate and further refine the existing templates following a community-led approach, and to investigate how to ensure (human-machine) interoperability in the data space, which aims to establish a diverse data offer (including datasets suitable for AI applications, as illustrated by the AI4Culture platform (2025)) as well as making use of DCAT. Our contribution will report on the following ongoing work: Alignment with DCAT: DCH datasheet fields are being mapped to DCAT to enable machine-readability Alignment between DCH datasheets and data-envelopes, establishing conceptual and structural compatibility, and supporting future integration with other legal, technical and ethical frameworks. Gathering a set of exemplary dataset descriptions Creation of (prototype) tooling to support and simplify the creation, reuse and integration of descriptions into existing workflows. We also plan to discuss new items that will begin before the conference: Identify possible connections with data research plans and data management plans. This may extend to interoperability with emerging European Cultural Heritage Cloud (ECHOES, 2025). Establish a modular structure for descriptions, aiming at operationalising the templates by defining building blocks, including a ‘core’ common to most DCH collections and a series of ‘profiles’, tailored to research data management and AI/ML workflows (e.g., AI Model Research Documentation Sheet (AIRDocS) (Oberbichler, 2025) Providing guidance to use these modules and possibly develop custom ones. While some components remain under active development (e.g. prototype, profiles and guidelines for their development), we present this work in progress to foster dialogue and invite broader engagement from the Fantastic Futures community. References AI4Culture project (2025). AI4Culture, Empowering Cultural Heritage through Artificial Intelligence. https://ai4culture.eu Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Irollo, A., Lehmann, J., Neudecker, C., Osti, G., & van Strien, D. (2023, September 25). Datasheets for Digital Cultural Heritage Datasets—Template v.1. Zenodo. https://zenodo.org/records/8375034 Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Lehmann, J., Neudecker, C., Osti, G., & Van Strien, D. (2023). Datasheets for Digital Cultural Heritage Datasets. Journal of Open Humanities Data, 9, 17. https://doi.org/10.5334/johd.124 Common European data space for cultural heritage (2025), Welcome to the Common European data space for cultural heritage. https://www.dataspace-culturalheritage.eu/en ECHOES project (2025), ECCCH, The Cultural Heritage Cloud, https://www.echoes-eccch.eu/ Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for Datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723 Luthra, M., & Eskevich, M. (2024). Data-Envelopes for Cultural Heritage: Going beyond Datasheets. In I. Siegert & K. Choukri (Eds.), Proceedings of the Workshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024 (pp. 52–65). ELRA and ICCL. https://aclanthology.org/2024.legal-1.9 Lehmann, J., & Schneider, S. (2024). Metadata of the "Alter Realkatalog" (ARK) of Berlin State Library (SBB). https://doi.org/10.5281/zenodo.13284442 Oberbichler, S. (2025). AI Model Research Documentation Sheet (AIRDocS). https://doi.org/10.5281/zenodo.15046713 Padilla, T., Scates Kettler, H., Varner, S., & Shorish, Y. (2023). Vancouver Statement on Collections as Data. https://zenodo.org/records/8342171 Pushkarna, M., Zaldivar, A., & Kjartansson, O. (2022). Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. 2022 ACM Conference on Fairness, Accountability, and Transparency, 1776–1826. https://doi.org/10.1145/3531146.3533231 World Wide Web Consortium. (2024). Data Catalog Vocabulary (DCAT) - Version 3. https://www.w3.org/TR/vocab-dcat-3/

Keywords

Metadata, datasheets, data enveloppes, Datasets as Topic, interoperability

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!