
Suffix trees and suffix arrays are the most prominent full-text indices, and their construction algorithms are well studied. In the literature, the fastest algorithm runs in $O(n)$ time, while it requires $O(n\log n)$-bit working space, where $n$ denotes the length of the text. On the other hand, the most space-efficient algorithm requires $O(n)$-bit working space while it runs in $O(n\log n)$ time. It was open whether these indices can be constructed in both $o(n\log n)$ time and $o(n\log n)$-bit working space. This paper breaks the above time-and-space barrier under the unit-cost word RAM. We give an algorithm for constructing the suffix array, which takes $O(n)$ time and $O(n)$-bit working space, for texts with constant-size alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires $O(n\log^{\epsilon}n)$ time and $O(n)$-bit working space for any $0<\epsilon<1$. Apart from that, our algorithm can also be adopted to build other existing full-text indices, such as compressed suffix tree, compressed suffix arrays, and FM-index. We also study the general case where the size of the alphabet $\Sigma$ is not constant. Our algorithm can construct a suffix array and a suffix tree using optimal $O(n\log|\Sigma|)$-bit working space while running in $O(n\log\log|\Sigma|)$ time and $O(n(\log^{\epsilon}n+\log|\Sigma|))$ time, respectively. These are the first algorithms that achieve $o(n\log n)$ time with optimal working space. Moreover, for the special case where $\log|\Sigma|=O((\log\log n)^{1-\epsilon})$, we can speed up our suffix array construction algorithm to the optimal $O(n)$.
Suffix arrays, 500, Text indexing, Preprocessing, Suffix trees, 004
Suffix arrays, 500, Text indexing, Preprocessing, Suffix trees, 004
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 52 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
