Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Webis Crowd RAG Corpus 2025

Webis-CrowdRAG-25
Authors: Gienapp, Lukas; Hagen, Tim; Fröbe, Maik; Hagen, Matthias; Stein, Benno; Potthast, Martin; Scells, Harrisen;

Webis Crowd RAG Corpus 2025

Abstract

Data Documentation responses.jsonl.gz RAG responses of about 250 words written by human writers and LLMs in different response styles. Key Description Source Values response The UUID of the response / task. Task Specification UUID topic The topic ID of this task. Task Specification String ID style The text style of the response written for this task. Task Specification One of essay, news, bullet kind Whether this text was written by an LLM or a human Task Specification One of human, llm query The query text of this topic. TREC RAG String value references_ids The IDs of the 20 sources retrieved for this topics' query. Aligned with references_texts TREC RAG List of String IDs references_texts The texts of the 20 sources retrieved for this topics' query. Aligned with references_ids TREC RAG List of String values text The text as written by the human author or LLM. Writing Survey String cleaned_text Text as cleaned by our preprocessing pipeline, without reference markers. Writing Survey String statements Text parsed into individual statements, each with the corresponding references_ids cited. Writing Survey List of Dictionaries ratings.jsonl.gz Ratings on pairwise response utility as given by crowd workers. The columns prefixed {dimension} below are included once for each possible dimension (correctness_topical, coherence_logical, coherence_stylistic, coverage_broad, coverage_deep, consistency_internal, quality_overall). Key Description Source Value submission_id The UUID of the questionnaire this response pair was rated by. Task Specification UUID query_id The topic id this response pair belongs to. TREC RAG String ID response_a The UUID of the first response in this pair (displayed lefthand side). Task Specification UUID response_b The UUID of the second response in this pair (displayed righthand side). Task Specification UUID worker The UUIDs of the 5 workers completing this questionnaire. Task Specification List of UUID {dimension}_vote The individual votes for the specified dimension by the 5 workers. Prolific Crowd Workers List of string, each entry a, n, or b {dimension}_spam_probability The individual spam probabilities associated with each vote for the specified dimension. Prolific Crowd Workers List of float, each entry between 0 and 1 {dimension}_p_a The probability of the gold label being a for the specified dimension (first response better than second). Prolific Crowd Workers float {dimension}_p_n The probability of the gold label being n for the specified dimension (both responses equal). Prolific Crowd Workers float {dimension}_p_b The probability of the gold label being b for the specified dimension (second response better than first). Prolific Crowd Workers float {dimension}_gold The gold label with highest probability for the specified dimension. Prolific Crowd Workers a, n, or b llm_ratings.jsonl.gz Ratings on pairwise response utility as given by an LLM. The columns prefixed {dimension} below are included once for each possible dimension (correctness_topical, coherence_logical, coherence_stylistic, coverage_broad, coverage_deep, consistency_internal, quality_overall). Key Description Source Value submission_id The UUID of the questionnaire this response pair was rated by. Task Specification UUID query_id The topic id this response pair belongs to. TREC RAG String ID response_a The UUID of the first response in this pair (displayed lefthand side). Task Specification UUID response_b The UUID of the second response in this pair (displayed righthand side). Task Specification UUID inference The inference mode the judgments were collected with. Task Specification combined, or individual {dimension} The rating given by the LLM for this {dimension}. LLM Inference a, n, or b grades.jsonl.gz Pointwise, per-topic ranked grades as inferred by a Bradley-Terry probabilistic model. Not to be used as absolute values across their topic context! Key Description Source Value response The UUID of the response. Task Specification UUID correctness_topical The topical correctness grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better. coherence_logical The logical coherence grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better. coherence_stylistic The stylistic coherence grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better. coverage_broad The broad coverage grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better. coverage_deep The deep coverage grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better. consistency_internal The internal consistency grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better. quality_overall The overall quality grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities