
All data were imported from the platform Transkribus on which the AI model for automatic transcription “Portuguese Handwriting 16th-19th c.” was last trained in July 2023 with the recognition engine Pylaia, and can now be used. The data are divided into ten folders, according to the total number of the trainings, from the initial to the definitive one, plus one set for final validation. The eight previous trainings were realized between June 2022 and May 2023. The history of all trainings can be read on e-Inquisition. Each of these folders corresponds to one collection in the platform; every collection has a number of documents; every document has a number of images, or pages, as indicated below. The ten uploaded folders (zip) are distributed as follows: —nine Training Sets (TS) (ca 92% of the whole data; status of the transcriptions from the TS: Ground Truth); —the final Validation Set (VS) (ca 8% of the whole data; status of the transcriptions from the VS: Ground Truth). All TS folders contain only the new data added to the following training (thus added to the previous data). Only the last VS, which is complete (505 p.), is provided. One document = images / transcribed pages (Ground Truth: transcription made by the members of TraPrInq project (Transcrever os processos da Inquisição portuguesa, 1536-1821 | Transcribing the court records of the Portuguese Inquisition, 1536-1821), which lasted from January 2023 to July 2024. The majority of the documents are titled as follows: IL_number = document extracted from a trial record (processo) by the Inquisition of Lisbon_number of the processo; other titles: IC_ = Inquisition of Coimbra; IE_ = Inquisition of Évora. Total of transcribed pages: 6,417. The quality of the images in the data (jpg) is equal to that of the images used for automatic transcription. All digitized images can be found on the catalog of the Portuguese National Archives (Arquivo Nacional da Torre do Tombo, ANTT). Available data (10 zip files, total size 6.7 GB): Training Set1: 698 pages/images Training Set2: 984 pages/images Training Set3: 869 pages/images Training Set4: 926 pages/images Training Set5: 631 pages/images Training Set6: 665 pages/images Training Set7: 564 pages/images Training Set8: 549 pages/images Training Set9: 531 pages/images Validation Set_Final: 505 pages/images 2-one pdf file: Paleographical criteria used by the team for the transcription of the documents; list of characters (in Portuguese).
Inquisition, Portugal, Handwritten Text Recognition, HTR, Archive
Inquisition, Portugal, Handwritten Text Recognition, HTR, Archive
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
