Using Amazon Mechanical Turk for linguistic research

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2010 English Publisher:National Library of SerbiaJournal:Psihologija, volume 43, pages 441-464 (issn: 0048-5705, eissn: 1451-9283,

Copyright policy )

Authors: Schnoebelen Tyler; Kuperman Victor;

doi: 10.2298/psi1004441s

Using Amazon Mechanical Turk for linguistic research

- Summary
- Subjects
- Metrics

Abstract

Amazon?s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One set of experiments measured the predictability of words in sentences using the Cloze sentence completion task (Taylor, 1953). The correlation between traditional and Turk Cloze scores is high (rho=0.823) and both data sets perform similarly against alternative measures of contextual predictability. Five other experiments on the semantic relatedness of verbs and phrasal verbs (how much is ?lift? part of ?lift up?) manipulate the presence of the sentence context and the composition of the experimental list. The results indicate that Turk data correlate well between experiments and with data from traditional methods (rho up to 0.9), and they show high inter-rater consistency and agreement. We conclude that Mechanical Turk is a reliable source of data for complex linguistic tasks in heavy use by psycholinguists. The paper provides suggestions for best practices in data collection and scrubbing.

Related Organizations

McMaster University
Canada
Stanford University
United States

Keywords

semantic similarity, predictability, Amazon Mechanical Turk, Psychology, crowdsourcing, web experiments, BF1-990

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	72
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

72

Top 10%

Published in a Diamond OA journal

Fields of Science

social sciences

psychology and cognitive sciences

Fields of Science

social sciences

psychology and cognitive sciences

Related to Research communities

UArctic