
arXiv: 1908.04686
handle: 10278/3734792 , 11568/1062324
We show how to build several data structures of central importance to string processing, taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let $n$ be the text length and $��$ be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in $O(n\log��)$ time using just $o(n\log��)$ bits of working space on top of the input BWT. Using these algorithms as building blocks, for any parameter $0 < ��\leq 1$ we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in $O\left(n(\log��+ ��^{-1}\cdot \log\log n)\right)$ time using at most $n\log��\cdot(��+ o(1))$ bits of working space on top of the input BWT and the output. In particular, this implies that we can build a compressed suffix tree from the BWT using just succinct working space (i.e. $o(n\log��)$ bits) and any time in $��(n\log��) + ��(n\log\log n)$. This improves the previous most space-efficient algorithms, which worked in $O(n)$ bits and $O(n\log n)$ time. We also consider the problem of merging BWTs of string collections, and provide a solution running in $O(n\log��)$ time and using just $o(n\log��)$ bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms use (in RAM) as few as $n$ bits on top of a packed representation of the input/output and process data as fast as $2.92$ megabases per second.
arXiv admin note: substantial text overlap with arXiv:1901.05226
FOS: Computer and information sciences, Burrows-Wheeler transform; Compressed suffix tree; LCP; PLCP;, Burrows-Wheeler transform, compressed suffix tree, LCP, PLCP, Data structures, PLCP, compressed suffix tree, Burrows-Wheeler transform, Algorithms on strings, LCP, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
FOS: Computer and information sciences, Burrows-Wheeler transform; Compressed suffix tree; LCP; PLCP;, Burrows-Wheeler transform, compressed suffix tree, LCP, PLCP, Data structures, PLCP, compressed suffix tree, Burrows-Wheeler transform, Algorithms on strings, LCP, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 5 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
