Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Portuguese Handwriting 16th-19th c.

AI Model training data
Authors: Baudry, Hervé; Pedro, Susana Tavares; de Campos, Marize Helena; Soares Fatela, Mário; Garcia, Leonor Dias; Paulo, Jorge Ferreira; Pereira, Maria Olinda Alves; +4 Authors

Portuguese Handwriting 16th-19th c.

Abstract

All data were imported from the platform Transkribus on which the AI model for automatic transcription “Portuguese Handwriting 16th-19th c.” was last trained in July 2023 with the recognition engine Pylaia, and can now be used. The data are divided into ten folders, according to the total number of the trainings, from the initial to the definitive one, plus one set for final validation. The eight previous trainings were realized between June 2022 and May 2023. The history of all trainings can be read on e-Inquisition. Each of these folders corresponds to one collection in the platform; every collection has a number of documents; every document has a number of images, or pages, as indicated below. The ten uploaded folders (zip) are distributed as follows: —nine Training Sets (TS) (ca 92% of the whole data; status of the transcriptions from the TS: Ground Truth); —the final Validation Set (VS) (ca 8% of the whole data; status of the transcriptions from the VS: Ground Truth). All TS folders contain only the new data added to the following training (thus added to the previous data). Only the last VS, which is complete (505 p.), is provided. One document = images / transcribed pages (Ground Truth: transcription made by the members of TraPrInq project (Transcrever os processos da Inquisição portuguesa, 1536-1821 | Transcribing the court records of the Portuguese Inquisition, 1536-1821), which lasted from January 2023 to July 2024. The majority of the documents are titled as follows: IL_number = document extracted from a trial record (processo) by the Inquisition of Lisbon_number of the processo; other titles: IC_ = Inquisition of Coimbra; IE_ = Inquisition of Évora. Total of transcribed pages: 6,417. The quality of the images in the data (jpg) is equal to that of the images used for automatic transcription. All digitized images can be found on the catalog of the Portuguese National Archives (Arquivo Nacional da Torre do Tombo, ANTT). Available data (10 zip files, total size 6.7 GB): Training Set1: 698 pages/images Training Set2: 984 pages/images Training Set3: 869 pages/images Training Set4: 926 pages/images Training Set5: 631 pages/images Training Set6: 665 pages/images Training Set7: 564 pages/images Training Set8: 549 pages/images Training Set9: 531 pages/images Validation Set_Final: 505 pages/images 2-one pdf file: Paleographical criteria used by the team for the transcription of the documents; list of characters (in Portuguese).

Keywords

Inquisition, Portugal, Handwritten Text Recognition, HTR, Archive

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities