
doi: 10.1137/0205003
We examine the efficiency of hash-coding and tree-search algorithms for retrieving from a file of k-letter words all words which match a partially-specified input queryword (for example, retrieving all six-letter English words of the form S**R*H where "*" is a "don't care" character). We precisely characterize those balanced hash-coding algorithms withminimum average number of lists examined. Use of the first few letters of each word as a list index is shown to be one such optimal algorithm.A new class of combinatorial designs (called associative block designs) provides better hash functions with a greatly reduced worst-case number of lists examined, yet with optimal average behavior maintained. Another efficient variant involves storing each word in several lists. Tree-search algorithms are shown to be approximately as efficient as hash-coding algorithms, on the average. In general, these algorithms require time about O(n
Information storage and retrieval of data, General topics in the theory of software, Symbolic computation and algebraic computation, Algorithms in computer science
Information storage and retrieval of data, General topics in the theory of software, Symbolic computation and algebraic computation, Algorithms in computer science
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 165 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 0.1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
