Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Leveraging AI for Enhanced Archaeological Data Extraction: Workflows for Textual and Image-Based Data

Authors: Pajdla, Petr; Novák, David; Harasim, Ronald; Křivánková, Dana; Straňák, Pavel; Lutsai, Kateryna; Lečbychová, Olga;

Leveraging AI for Enhanced Archaeological Data Extraction: Workflows for Textual and Image-Based Data

Abstract

The digitization of archaeological archives, particularly grey literature and archival photographs, holds immense potential for knowledge discovery. However, manual processing of such data is labour-intensive and often lacks consistency, making it a prime candidate for automation. This paper presents pilot implementation of re-usable digital research workflows that integrate text and image recognition technologies and AI models to streamline the analysis of archaeological documentation. These workflows are being developed for the purposes of enhancing (meta)data quality in the Archaeological Map of the Czech Republic (AMCR) digital repository and the ARIADNE Knowledge Base and discovery service. Our approach to textual data leverages OCR/HTR and NLP tools to process archival reports, generating machine-readable text from a combination of manuscripts, typescripts, and printed materials. Through AI-driven information extraction techniques, we prepare models for automated segmentation and OCR/HTR processing of documents. These are implemented through the e-Scriptorium service and a newly developed dashboard. Based on the recognition outputs, LINDAT/CLARIAH-CZ open-source tools are applied for enhanced full-text search (tokenization, tagging, lemmatization, etc.; UDPipe), identification of keywords (KER), and named entities recognition (personal and place names, temporal data, AMCR vocabulary terms, identifiers, etc.; NameTag). The desired goal is to provide an integrated solution that will enable processing of legacy data and new uploads to the AMCR system and offer users more efficient services for searching and processing documents. A secondary objective is to simplify archival procedures by automating some of the steps involved in describing and archiving documents. Simultaneously, we implement object recognition workflow for the detection and classification of archaeological objects, i.e. artefacts and other objects of interest, in archival photographs. By adapting and fine-tuning deep learning models (e.g., ResNet) for archaeology, we segment and annotate archival photographs according to AMCR controlled vocabularies. Two types of image datasets are used, firstly the images with single finds, photographed often on standardised backgrounds with scales, and secondly images with various content including photographs from fieldwork with trenches, burials, etc. Mappings of the vocabularies used across the datasets to the Getty AAT terms ensures interoperability in the context of ARIADNE infrastructure. This workflow streamlines the process of annotating archival photographs with terms from domainspecific controlled vocabularies and allows identification of archaeological artefacts and other objects of interest, which simplifies the otherwise time-consuming task of creating metadata and at the same time opens new doors for connecting and cross-referencing image data with textual data, e.g. the grey literature find reports. The talk summarises the journey leading towards the implementation of both of the workflows, discusses what so far worked and what did not, including the dead ends we encountered and what we learned along the way. The current state of workflows’ implementation will be demonstrated on pilot results based on the archival textual and image documents, showcasing how AI technologies can enhance archaeological archives processing and foster further research.

Presentation from a talk given at CAA2025 Digital Horizons conference in session 19. Reusable Digital Research Workflows for Archaeology.

Keywords

Archaeology

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average