Crowdsourcing Document Similarity Judgements

This is the data obtained from crowdsourcing tasks which ask workers to provide similarity metrics between pairs of documents. Each document, as well as each pair, has a unique ID. We provide crowd workers with the pairs through three different task variations: Variation 1: We showed workers 5 pairs of documents and, for each, asked them to rate their similarity in a 4-level Likert scale (None, Low, Medium, High), tell us a confidence level of how sure they were (from 0 to 4) and a written reason as to why they chose that similarity level. For quality reasons, two of the 5 pairs were golden-standards, which means we knew their ratings already and checked the workers' responses. They had to give the golden pair with the higher similarity a higher score than the other golden pair, otherwise, their answer would be rejected. Variation 2: We repeated variation 1 but with a slight alteration: instead of a Likert scale for the similarity score, we asked for a Magnitude Estimation, which is any number above 0. It could be 1, 0.0001, 1000, 42, as long as it was coherent, as in a more similar pair had a higher score than a less similar pair and vice-versa; Variation 3: We showed workers 5 rankings. Each ranking had a main document and 3 auxiliary documents to be compared against the main one. They also had to report a confidence score and give a short written reason, just like variation 1. The first ranking is a golden-standard, and we knew the values for the 3 pairs in it (the pairs were the main document paired with each of the 3 auxiliary documents), and they had to give the golden pair with the highest similarity a higher rank than the one with the lower similarity. The raw results from the tasks are recorded in the JSON file CrowdResults.json. For a description of its contents, please read the file CrowdResults_README.md. These raw annotations from the crowd were then parsed into the three CSVs you see, each corresponding to the aggregated results from one of the task variations. final_scores_likert.csv is the resulting scores for each pair using the variation 1 tasks; pair_id is a unique identifier for each pair; similarity_alg is the similarity assigned to the pair of documents from an automated similarity algorithm; relation is the type of relationship shown by the pair, where smaller values indicate more similar pairs; similarity_crowd_simple_maj stores the simple majority result from the crowd's annotations; similarity_crowd_simple_mean stores the mean of the crowd's annotations; similarity_crowd_simple_median stores the median of the crowd's annotations; final_scores_magnitude.csv is the resulting scores for each pair using the variation 2 tasks; pair_id is a unique identifier for each pair; similarity_alg is the similarity assigned to the pair of documents from an automated similarity algorithm; relation is the type of relationship shown by the pair, where smaller values indicate more similar pairs; scaled_similarity_worker is the magnitude score scaled based on worker's behaviours scaled_similarity_worker_docset is the magnitude score scaled based both on the worker's behaviour and on the pair final_scores_ranking.csv is the resulting scores for each pair using the variation 3 tasks; pair_id is a unique identifier for each pair; similarity_alg is the similarity assigned to the pair of documents from an automated similarity algorithm; relation is the type of relationship shown by the pair, where smaller values indicate more similar pairs; mean_similarity is the mean ranking from that value This dataset was built and used as part of the TheyBuyForYou project.

Related Organizations

King's College London
United Kingdom

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average