
doi: 10.1145/3641854
The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the GSACA algorithm introduced in 2015 is the only known non-recursive linear-time suffix array construction algorithm (SACA). Despite its interesting theoretical properties, there has been little effort in improving GSACA ’s non-competitive real-world performance. There is a super-linear algorithm DSH , which relies on the same sorting principle and is faster than DivSufSort , the fastest SACA for over a decade. The purpose of this article is twofold: We analyse the sorting principle used in GSACA and DSH and exploit its properties to give an optimised linear-time algorithm, and we show that it can be very elegantly used to compute both the original extended Burrows-Wheeler transform ( eBWT ) and a bijective version of the Burrows-Wheeler transform ( BBWT ) in linear time. We call the algorithm “generic,” since it can be used to compute the regular suffix array and the variants used for the BBWT and eBWT . Our suffix array construction algorithm is not only significantly faster than GSACA but also outperforms DivSufSort and DSH . Our BBWT -algorithm is faster than or competitive with all other tested BBWT construction implementations on large or repetitive data, and our eBWT -algorithm is faster than all other programs on data that is not extremely repetitive.
String algorithms, Data structures, Bijective Burrows-Wheeler transform, suffix array, string algorithms, Algorithms on strings, bijective Burrows-Wheeler transform, Suffix sorting, Suffix array, suffix sorting, info:eu-repo/classification/ddc/530, Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science), Searching and sorting
String algorithms, Data structures, Bijective Burrows-Wheeler transform, suffix array, string algorithms, Algorithms on strings, bijective Burrows-Wheeler transform, Suffix sorting, Suffix array, suffix sorting, info:eu-repo/classification/ddc/530, Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science), Searching and sorting
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 6 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
