Computing Quasi Suffix Arrays

descriptionPublicationkeyboard_double_arrow_right Other literature type , Article 01 Jan 2003 English Publisher:Justus-Liebig-Universität, Institut für Informatik, GießenJournal:J. Autom. Lang. Comb., volume 8, pages 593-606

Authors: Frantisek Franek; Jan Holub 0001; William F. Smyth; Xiangdong Xiao;

doi: 10.25596/jalc-2003-593

Computing Quasi Suffix Arrays

- Summary
- Subjects
- Metrics

Abstract

We introduce quasi suffix arrays as a generalization of suffix arrays for character strings. We show that a quasi suffix array encodes enough of the structure of the string to be a useful construct for many applications where the full power of suffix arrays is not necessary, notably in problems that do not require lexicographical order, for example, pattern-matching or calculation of repeating substrings. We are interested in quasi suffix arrays, for we believe that they can be calculated by simple, fast, and space efficient algorithms. As a first step towards this goal, we describe a family DIST of algorithms (inspired by the Crochemore's repetitions algorithm) that compute the quasi suffix array in the average-case in $O(|x| \log |x|)$ time, where $x$ is the input string. Based on experiments conducted by one of us (Xiao), it appears that in practice our algorithms execute faster than all suffix tree and most suffix array construction algorithms. Though at this time we can only prove that the average-case complexity is $O(|x| \log |x|)$, tests carried out by one of us (Holub) strongly suggest that not only the worst-case complexity may be the same as the average-case complexity, but both may in fact be linear. Given the very recent results on computing suffix arrays in linear time by recursive algorithms, the only advantage quasi suffix arrays can have lies in the simplicity and space efficiency of DIST algorithms that do not use recursion.

Journal of Automata, Languages and Combinatorics, Volume 8, Number 4, 2003, 593-606

Keywords

quasi suffix arrays, Combinatorics on words, pattern matching, suffix trees, string algorithms, Nonnumerical algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average