Language modelling for efficient beam-search

descriptionPublicationkeyboard_double_arrow_right Article 01 Oct 1995 Italy, Canada English Publisher:Elsevier BVJournal:Computer Speech & Language, volume 9, pages 353-379 (issn: 0885-2308,

Copyright policy )

Authors: Federico, Marcello; Cettolo, Mauro; Brugnara, Fabio; G. Antoniol;

doi: 10.1006/csla.1995.0017

handle: 11582/3404

Language modelling for efficient beam-search

- Summary
- Subjects
- Metrics

Abstract

Abstract This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by a hidden Markov model based, beam-search, continuous speech recognizer. A review of the best known bigram estimation techniques is given together with a description of the original Stacked model. Language model comparisons in terms of perplexity are given for three text corpora with different data sparseness conditions, while speech recognition accuracy tests are presented for a 10 000-word real-time, speaker independent dictation task. The Stacked estimation method compares favourably with the others, by achieving about 93% of word accuracy. If better language model estimates can improve recognition accuracy, representations better suited to the search algorithm can improve its speed as well. Two static representations of language models are introduced: linear and tree-based. Results show that the latter organization is better exploited by the beam-search algorithm as it provides a five times faster response with same word accuracy. Finally, an off-line reduction algorithm is presented that cuts the space requirements of the tree-based topology to about 40%.The proposed solutions presented here have been successfully employed in a real-time, speaker independent, 10 000-word real-time dictation system for radiological reporting.

Countries

Italy, Canada

Related Organizations

Fondazione Bruno Kessler
Italy
Polytechnique Montréal
Canada

Keywords

language model, speech recognition, 004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	23
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%