Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

The REVERINO Collection of Regesta

Authors: PUCCETTI, Giovanni; Righi, Laura; Sabbatini, Ilaria; Esuli, Andrea;

The REVERINO Collection of Regesta

Abstract

Overview The REVERINO Dataset is a collection of 4,533 pairs of Latin regesta (summaries) and their corresponding full-text medieval pontifical documents.The dataset is derived from two primary collections: MGH: Epistolae saeculi XIII e regestis pontificum Romanorum selectae (1216-1268) Auvray: Les Registres de Gregoire IX (1227/41) The dataset is designed to support research in Latin text summarization and the development of tools for automatic regesta generation using Large Language Models (LLMs). It serves as a benchmark for evaluating the performance of LLMs in summarizing medieval Latin texts. Dataset Structure The dataset is organized into nine JSON files, each corresponding to a volume of the collections. Each JSON file contains an array of objects, where each object represents a single document with the following fields: numero: A unique identifier for the document. header: The header or title of the document, often including the date and location. regesto: An array of strings representing the _regestum_ (summary) of the document. testo esteso: An array of strings representing the full text of the document. apparato: An array of strings containing the apparatus (metadata or references) for the document. Data Curation Process The dataset was created through a four-step pipeline: Annotation: Manual annotation of selected pages using the eScriptorium platform to train segmentation models. Training: Adaptation of segmentation models to the specific layout of the manuscripts. Extraction: Automated extraction of text lines from the annotated pages. Post-processing: Separation of regesta, full texts, and apparatus using heuristics based on content and position. License This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt the material for any purpose, provided you give appropriate credit to the original authors. References Puccetti, Giovanni, Laura Righi, Ilaria Sabbatini, and Andrea Esuli. "REVERINO: REgesta generation VERsus latIN summarizatiOn." IRCDL, 2025. Acknowledgments This work was supported by the Italian Strengthening of ESFRI RI RESILIENCE (ITSERR) project, funded by the European Union under the NextGenerationEU funding scheme (CUP: B53C22001770006). Contact Giovanni Puccetti [giovanni.puccetti@isti.cnr.it]

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities