Association pattern language modeling

descriptionPublicationkeyboard_double_arrow_right Article 01 Sep 2006Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Audio, Speech and Language Processing, volume 14, pages 1,719-1,728 (issn: 1558-7916,

Copyright policy )

Authors: Jen-Tzung Chien;

doi: 10.1109/tsa.2005.858551

Association pattern language modeling

- Summary
- Metrics

Abstract

Statistical n-gram language modeling is popular for speech recognition and many other applications. The conventional n-gram suffers from the insufficiency of modeling long-distance language dependencies. This paper presents a novel approach focusing on mining long distance word associations and incorporating these features into language models based on linear interpolation and maximum entropy (ME) principles. We highlight the discovery of the associations of multiple distant words from training corpus. A mining algorithm is exploited to recursively merge the frequent word subsets and efficiently construct the set of association patterns. By combining the features of association patterns into n-gram models, the association pattern n-grams are estimated with a special realization to trigger pair n-gram where only the associations of two distant words are considered. In the experiments on Chinese language modeling, we find that the incorporation of association patterns significantly reduces the perplexities of n-gram models. The incorporation using ME outperforms that using linear interpolation. Association pattern n-gram is superior to trigger pair n-gram. The perplexities are further reduced using more association steps. Further, the proposed association pattern n-grams are not only able to elevate document classification accuracies but also improve speech recognition rates

Related Organizations

National Cheng Kung University
Taiwan

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	22
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

22

Top 10%

Fields of Science

medical and health sciences

other medical science

Fields of Science

medical and health sciences

other medical science

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now