
Summary form only given. Lexicon lookup is an essential part of almost every natural language processing system. A natural language lexicon is a set of strings where each string consists of a word and the associated linguistic data. Its computer representation is a structure that returns appropriate linguistic data on a given input word. It should be small and fast. We propose a method for lexicon compression based on a very efficient trie compression method and the inverted file paradigm. The method was applied on a 664000 string, 18 Mbyte, French phonetic and grammatical electronic dictionary for spelling-to-phonetics conversion. Entries in the lexicon are strings consisting of a word, its phonetic transcription, and some additional codes.
[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
