Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Encoding Complexity - TEI Modeling in the 'Forschungsportal BACH' Project

Authors: Quenouille, Nadine;

Encoding Complexity - TEI Modeling in the 'Forschungsportal BACH' Project

Abstract

The long-term project "Forschungsportal BACH" launched in 2023, is a collaborative effort between the Saxon Academy of Sciences and Humanities in Leipzig and the Bach Archive Leipzig. The goal of the project is to document, digitally process, and make available all extant non-musical documents from the family of Johann Sebastian Bach, spanning from the late 16th to the early 19th century, in an online research portal. The textual sources include private and official correspondence, as well as educational and professional records, legal documents, and other related materials – such as student registers, timetables, account books, wills, petitions, official records, as well as copies and transcripts of various public documents. Letters represent only a small portion of this overall heterogeneous corpus. In the project we established a complex workflow starting with visiting the archives and digitizing the sources, via their recording in the project’s database, automatic text recognition and transcription by “Transkribus”, the correction and structural annotation of the text that takes place there, to the automatic conversion from Transkribus output in PAGE XML to TEI via XSLT and further textual annotations in “TEI Publisher”. To ensure standardization and long-term usability, all texts are encoded in TEI-P5 format. This encoding follows established guidelines, although the diversity and specificity of the sources occasionally present challenges. This presentation addresses specific challenges in modeling a structurally and semantically heterogeneous corpus, using three very different document types as examples – a will, a school register, and a coherent document written by a single scribe that contains transcripts and attachments of multiple documents. The focus is on issues of structuring, annotation, and semantic interpretation, especially in areas where the current TEI modules may not fully address the needs of the sources. Furthermore, solutions will be proposed for how these challenges can be addressed within the current TEI framework – for example, through a nuanced combination of existing elements and careful modeling that closely aligns with the materiality of the sources. The aim is to make transparent the strategies developed within the project and to stimulate further discussion on how to approach structurally and semantically diverse sources, particularly in cases where the scalability of existing modules may not fully meet the demands of the project. Throughout, the encoding remains TEI-valid and is intentionally free of project-specific customizations in the form of a separate schema against which it could be validated, as the full scope of editorial requirements will only become clear during the course of the work. This presentation offers insights into the editorial practice of working with a heterogeneous corpus and aims to foster a discussion on modeling strategies for complex and diverse archival collections.

Related Organizations
Keywords

Forschungsportal BACH, Historical Texts Annotation, Heterogeneous Corpus

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!