
Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into reducing the space usage, leading ultimately to compressed suffix trees. These compressed data structures can efficiently simulate the suffix tree, while using space proportional to a compressed representation of the sequence. In this work, we take a new approach to compressed suffix trees for repetitive sequence collections, such as collections of individual genomes. We compress the suffix trees of individual sequences relative to the suffix tree of a reference sequence. These relative data structures provide competitive time/space trade-offs, being almost as small as the smallest compressed suffix trees for repetitive collections, and competitive in time with the largest and fastest compressed suffix trees.
Accepted to The Computer Journal. The implementation is available at https://github.com/jltsiren/relative-fm
FOS: Computer and information sciences, Original article, Computer and information sciences, DE-BRUIJN GRAPHS, SEQUENCES, compressed text indexing, suffix trees, RANDOM-ACCESS, ARRAYS, Compressed text indexing, Repetitive collections, Computer Science - Data Structures and Algorithms, COMPRESSED TEXT, Data Structures and Algorithms (cs.DS), Suffix trees, repetitive collections, CODES
FOS: Computer and information sciences, Original article, Computer and information sciences, DE-BRUIJN GRAPHS, SEQUENCES, compressed text indexing, suffix trees, RANDOM-ACCESS, ARRAYS, Compressed text indexing, Repetitive collections, Computer Science - Data Structures and Algorithms, COMPRESSED TEXT, Data Structures and Algorithms (cs.DS), Suffix trees, repetitive collections, CODES
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
