crisesStorylinesRAG

This Zenodo record provides the data and code supporting the generation and validation of AI-derived disaster storylines and knowledge graphs from news articles. The dataset and pipeline are designed to augment historical disaster records with fact-based narratives and structured representations extracted at scale from media reports, using large language models (LLMs) with retrieval-augmented generation (RAG). The workflow uses disaster events from the EM-DAT database (2014–2024) as input to retrieve relevant news from the European Media Monitor (EMM). Retrieved articles are processed using LLMs provided by the GPT@JRC service, which generate coherent disaster storylines and corresponding knowledge graphs capturing hazards, impacts, drivers, and response actions. The Zenodo archive includes: input_emdat_1424.xlsxSubset of EM-DAT disaster events (2014–2024) used as input to the pipeline, providing event type, location, and temporal information for news retrieval. DisasterStory.csvOutput of the full pipeline, containing AI-generated disaster storylines and associated knowledge graph representations for events retrieved from EMM. triplet_expert_val.xlsxA labeled validation dataset of 1,000 factual triplets randomly sampled from the generated knowledge graphs and annotated by six independent experts, indicating whether each relationship is supported by the corresponding storyline text. survey.xlsxContains the results of the knowledge graph evaluation performed by DRM experts, summarizing expert assessments and consensus metrics for the generated graphs. Source codeThe complete pipeline for event selection, news retrieval, RAG-based storyline generation, knowledge graph construction, triplet extraction, and quantitative validation. Available as a Zenodo snapshot for reproducibility. Additional access points include: GitHub: https://github.com/jrcf7/crisesStorylinesRAG Hugging Face Space: https://huggingface.co/spaces/roncmic/crisesStorylinesRAG This resource enables quantitative evaluation of factual consistency and inter-annotator agreement for AI-generated disaster knowledge representations and provides a reusable framework applicable to other event catalogs beyond EM-DAT.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average