Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC 0
Data sources: Datacite
versions View all 2 versions
addClaim

OCR Groundtruth for Swinemünder Badeanzeiger

Authors: Steiner, Steffen; Krüger, Frank;

OCR Groundtruth for Swinemünder Badeanzeiger

Abstract

This dataset contains the ground truth annotation for extracting and structuring information from the old newspaper "Swinemünder Badeanzeiger" tables. The newspaper was obtained from Digitale Bibliothek Mecklenburg Vorpommern https://www.digitale-bibliothek-mv.de/viewer/toc/PPN636776093/ The data was obtained by selecting one "Swinemünder Badeanzeiger" image per year and manually transcribing the content. The dataset is structured based on the newspaper's publication year. One folder for each year contains a folder named according to the original image ID and includes the following data table_[running_number].jpg image with the segmented table table_[running_number]_annotation.json data extracted and structured from the segmented image by manual transcription table_[running_number]_index_connected.json list that connected the entry with the corresponding table rows to maintain multi-row entries For each entry, a JSON entry was created and added to table_[running_number]_annotation.json, which consists of the following fields: input: Transcription of the original row, including markers for columns Nummer: The sequence number of the row as extracted from the input field Vorname: The first name, if it exists otherwise null Nachname: The last name, if it exists; otherwise null Titel: The (academic) title, if it exists, otherwise null Beruf: The profession, if it exists; otherwise null Sozialer Stand: The social status, if it exists, otherwise null Begleitung: Any companion, such as family members or servants, if exists, otherwise null Wohnort: The city, where the person(s) arrived from, if it exists, otherwise null Wohnung: The local residence, such as a hotel, pension, or vacation home, if it exists, otherwise null Personenanzahl: The overall number of persons that are represented by this entry In addition to the separate annotation files, the file swinebad_groundtruth.json has a complete list of all entries to facilitate more straightforward data analysis. To this end, each entry was completed with the following data. date: The publication date of the newspaper where the entry was published The following example lists an entry which was obtained from the fourth line of the table as published at https://www.digitale-bibliothek-mv.de/viewer/image/PPN636776093_1910/1/LOG_0003/ { "input": "973 | Dr. Auerbach, Richard, Journalist, mit Frau | „ | Villa Kaiser Wilhelm | 2", "Nummer": "973", "Vorname": "Richard", "Nachname": "Auerbach", "Titel": "Dr.", "Beruf": "Journalist", "Sozialer Stand": null, "Begleitung": "mit Frau", "Wohnort": "Berlin", "Wohnung": "Villa Kaiser Wilhelm", "Personenanzahl": "2", "date": "1910-06-06" }, You are welcome to cite the 'Digitale Bibliothek MV / Universität Greifswald' (+ URN for digital publications or the shelfmark for printed publications) as the source for the images.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average