String Indexing with Compressed Patterns

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 26 Sep 2023Embargo end date: 01 Jan 2019 Denmark, Germany English Publisher:Association for Computing Machinery (ACM)Journal:ACM Transactions on Algorithms, volume 19, pages 1-19 (issn: 1549-6325, eissn: 1549-6333,

Copyright policy )

Authors: Philip Bille; Inge Li Gørtz; Teresa Anna Steiner;

doi: 10.1145/3607141 , 10.48550/arxiv.1909.11930

arXiv: 1909.11930

String Indexing with Compressed Patterns

- Summary
- Subjects
- Metrics

Abstract

Given a string S of length n , the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this article, we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way, we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.

Countries

Denmark, Germany

Related Organizations

Technical University of Denmark
Denmark
Schloss Dagstuhl – Leibniz Center for Informatics
Germany
Leibniz Association
Germany

Keywords

FOS: Computer and information sciences, Compression, String indexing, compression, Computer science, 004, pattern matching, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Pattern matching, string indexing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average