Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

crisesStorylinesRAG

Authors: Ronco, Michele; Bandelli, Luca; Bertolini, Lorenzo; Consoli, Sergio; Delforge, Damien; Spadaro, Alessio; Verile, Marco; +1 Authors

crisesStorylinesRAG

Abstract

This Zenodo record provides the data and code supporting the generation and validation of AI-derived disaster storylines and knowledge graphs from news articles. The dataset and pipeline are designed to augment historical disaster records with fact-based narratives and structured representations extracted at scale from media reports, using large language models (LLMs) with retrieval-augmented generation (RAG). The workflow uses disaster events from the EM-DAT database (2014–2024) as input to retrieve relevant news from the European Media Monitor (EMM). Retrieved articles are processed using LLMs provided by the GPT@JRC service, which generate coherent disaster storylines and corresponding knowledge graphs capturing hazards, impacts, drivers, and response actions. The Zenodo archive includes: input_emdat_1424.xlsxSubset of EM-DAT disaster events (2014–2024) used as input to the pipeline, providing event type, location, and temporal information for news retrieval. DisasterStory.csvOutput of the full pipeline, containing AI-generated disaster storylines and associated knowledge graph representations for events retrieved from EMM. triplet_expert_val.xlsxA labeled validation dataset of 1,000 factual triplets randomly sampled from the generated knowledge graphs and annotated by six independent experts, indicating whether each relationship is supported by the corresponding storyline text. survey.xlsxContains the results of the knowledge graph evaluation performed by DRM experts, summarizing expert assessments and consensus metrics for the generated graphs. Source codeThe complete pipeline for event selection, news retrieval, RAG-based storyline generation, knowledge graph construction, triplet extraction, and quantitative validation. Available as a Zenodo snapshot for reproducibility. Additional access points include: GitHub: https://github.com/jrcf7/crisesStorylinesRAG Hugging Face Space: https://huggingface.co/spaces/roncmic/crisesStorylinesRAG This resource enables quantitative evaluation of factual consistency and inter-annotator agreement for AI-generated disaster knowledge representations and provides a reusable framework applicable to other event catalogs beyond EM-DAT.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average