Discriminative n-gram language modeling

descriptionPublicationkeyboard_double_arrow_right Article 01 Apr 2007 English Publisher:Elsevier BVJournal:Computer Speech & Language, volume 21, pages 373-392 (issn: 0885-2308,

Copyright policy )

Authors: Brian Roark; Murat Saraclar; Michael Collins 0001;

doi: 10.1016/j.csl.2006.06.006

Discriminative n-gram language modeling

- Summary
- Metrics

Abstract

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on maximizing the regularized conditional log-likelihood. The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. We describe a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptron's weights; this method gives an additional 0.5% reduction in word error rate (WER) over training with the perceptron alone. The final system achieves a 1.8% absolute reduction in WER for a baseline first-pass recognition system (from 39.2% to 37.4%), and a 0.9% absolute reduction in WER for a multi-pass recognition system (from 28.9% to 28.0%).

Related Organizations

Massachusetts Institute of Technology
United States
Oregon Health & Science University
United States
MIT Computer Science and Artificial Intelligence Laboratory
United States
Boğaziçi University
Turkey

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	101
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%