Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Code4Lib Journalarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Code4Lib Journal
Article . 2011
Data sources: DOAJ
addClaim

Using Amazon Mechanical Turk to Transcribe Historical Handwritten Documents

Authors: Andrew S.I.D. Lang; Joshua Rio-Ross;

Using Amazon Mechanical Turk to Transcribe Historical Handwritten Documents

Abstract

The developing “information age” is continually unraveling new ways of discovering, presenting and sharing information. Most new academic material is digitally formatted upon its creation and is thus easy to find and query. However, there remains a good deal of material from times prior to the “information age” that has yet to be converted to digital form. Much of this material can be found in library collections—whether academic, public or private—and thus remains available only to a limited number of locals or willing-and-able sojourners. Using OCR technology, most typeset documents can be digitized and made available online; and there are several projects underway to do exactly this. However, there remains little to be done for handwritten materials. Those who own collections of handwritten documents are increasingly wanting to make the content thereof available to the general public. Unfortunately, traditional transcription models typically prove to be expensive or inefficient and pdf snapshots are not searchable. We have developed a model for digital transcription using Google Docs and Amazon's Mechanical Turk. Using this model, one can use an online workforce to efficiently transcribe handwritten texts and perform quality control at a cost much lower than professional transcription services. To illustrate the model we used Amazon’s Mechanical Turk to transcribe and then proofread the Frederick Douglass Diary which we have made available on a public searchable wiki. The total cost of transcription and proofreading for the 72 page diary was less than $25.00 with some pages being transcribed and proofread for as little as $0.04. Our results show that using Amazon’s Mechanical Turk holds great promise for providing an affordable transcription method for hand-written historical documents making them easily sharable and fully searchable.

Keywords

Bibliography. Library science. Information resources, Z

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
gold