Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2020
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2020
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2020
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://doi.org/10.5281/zenodo...
Dataset . 2020
License: CC BY
Data sources: Sygma
versions View all 3 versions
addClaim

Crowdsourcing Document Similarity Judgements

Authors: Amaral, Gabriel Maia Rocha;

Crowdsourcing Document Similarity Judgements

Abstract

This is the data obtained from crowdsourcing tasks which ask workers to provide similarity metrics between pairs of documents. Each document, as well as each pair, has a unique ID. We provide crowd workers with the pairs through three different task variations: Variation 1: We showed workers 5 pairs of documents and, for each, asked them to rate their similarity in a 4-level Likert scale (None, Low, Medium, High), tell us a confidence level of how sure they were (from 0 to 4) and a written reason as to why they chose that similarity level. For quality reasons, two of the 5 pairs were golden-standards, which means we knew their ratings already and checked the workers' responses. They had to give the golden pair with the higher similarity a higher score than the other golden pair, otherwise, their answer would be rejected. Variation 2: We repeated variation 1 but with a slight alteration: instead of a Likert scale for the similarity score, we asked for a Magnitude Estimation, which is any number above 0. It could be 1, 0.0001, 1000, 42, as long as it was coherent, as in a more similar pair had a higher score than a less similar pair and vice-versa; Variation 3: We showed workers 5 rankings. Each ranking had a main document and 3 auxiliary documents to be compared against the main one. They also had to report a confidence score and give a short written reason, just like variation 1. The first ranking is a golden-standard, and we knew the values for the 3 pairs in it (the pairs were the main document paired with each of the 3 auxiliary documents), and they had to give the golden pair with the highest similarity a higher rank than the one with the lower similarity. The raw results from the tasks are recorded in the JSON file CrowdResults.json. For a description of its contents, please read the file CrowdResults_README.md. These raw annotations from the crowd were then parsed into the three CSVs you see, each corresponding to the aggregated results from one of the task variations. final_scores_likert.csv is the resulting scores for each pair using the variation 1 tasks; pair_id is a unique identifier for each pair; similarity_alg is the similarity assigned to the pair of documents from an automated similarity algorithm; relation is the type of relationship shown by the pair, where smaller values indicate more similar pairs; similarity_crowd_simple_maj stores the simple majority result from the crowd's annotations; similarity_crowd_simple_mean stores the mean of the crowd's annotations; similarity_crowd_simple_median stores the median of the crowd's annotations; final_scores_magnitude.csv is the resulting scores for each pair using the variation 2 tasks; pair_id is a unique identifier for each pair; similarity_alg is the similarity assigned to the pair of documents from an automated similarity algorithm; relation is the type of relationship shown by the pair, where smaller values indicate more similar pairs; scaled_similarity_worker is the magnitude score scaled based on worker's behaviours scaled_similarity_worker_docset is the magnitude score scaled based both on the worker's behaviour and on the pair final_scores_ranking.csv is the resulting scores for each pair using the variation 3 tasks; pair_id is a unique identifier for each pair; similarity_alg is the similarity assigned to the pair of documents from an automated similarity algorithm; relation is the type of relationship shown by the pair, where smaller values indicate more similar pairs; mean_similarity is the mean ranking from that value This dataset was built and used as part of the TheyBuyForYou project.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 8
    download downloads 2
  • 8
    views
    2
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
8
2