Mismatched Crowdsourcing based Language Perception for Under-resourced Languages

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 2016 English Publisher:Elsevier BVJournal:Procedia Computer Science, volume 81, pages 23-29 (issn: 1877-0509,

Copyright policy )

Authors: Wenda Chen; Mark Hasegawa-Johnson; Nancy F. Chen;

doi: 10.1016/j.procs.2016.04.025

Mismatched Crowdsourcing based Language Perception for Under-resourced Languages

- Summary
- Subjects
- Metrics

Abstract

AbstractMismatched crowdsourcing is a technique for acquiring automatic speech recognizer training data in under-resourced languages by decoding the transcriptions of workers who don’t know the target language using a noisy-channel model of cross-language speech perception. All previous mismatched crowdsourcing studies have used English transcribers; this study is the first to recruit transcribers with a different native language, in this case, Mandarin Chinese. Using these data we are able to compute statistical models of cross-language perception of the tones and phonemes from transcribers based on phone distinctive features and tone features. By analyzing the phonetic and tonal variation mappings and coverages compared with the dictionary of the target language, we evaluate the different native languages’ effect on the transcribers’ performances.

Related Organizations

University of Illinois at Urbana Champaign
United States
University of Illinois at Urbana-Champaign
United States
University of Illinois Urbana-Champaign
United States
Institute for Infocomm Research
Singapore
Agency for Science, Technology and Research
Singapore

Keywords

Mismatched Crowdsourcing, Speech Recognition, Speech Perception, Low Resource Language

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%