Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article , Preprint , Conference object 01 Jan 2017Embargo end date: 01 Jan 2017Publisher:Springer International Publishing

Authors: Takuya Takagi; Keisuke Goto 0001; Yuta Fujishige; Shunsuke Inenaga; Hiroki Arimura;

doi: 10.1007/978-3-319-67428-5_26 , 10.48550/arxiv.1705.09779

arXiv: 1705.09779

Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

- Summary
- Subjects
- Related research
  (7)
- Metrics

Abstract

In this paper, we propose a novel approach to combine \emph{compact directed acyclic word graphs} (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with $O(\tilde e_T \log n)$ bits of space allowing for $O(\log n)$-time random and $O(1)$-time sequential accesses to edge labels, and $O(m \log ��+ occ)$-time pattern matching. Here, $\tilde e_T$ is the number of all extensions of maximal repeats in $T$, $n$ and $m$ are respectively the lengths of the text $T$ and a given pattern, $��$ is the alphabet size, and $occ$ is the number of occurrences of the pattern in $T$. The repetitiveness measure $\tilde e_T$ is known to be much smaller than the text length $n$ for highly repetitive text. For constant alphabets, our L-CDAWGs achieve $O(m + occ)$ pattern matching time with $O(e_T^r \log n)$ bits of space, which improves the pattern matching time of Belazzougui et al.'s run-length BWT-CDAWGs by a factor of $\log \log n$, with the same space complexity. Here, $e_T^r$ is the number of right extensions of maximal repeats in $T$. As a byproduct, our result gives a way of constructing an SLP of size $O(\tilde e_T)$ for a given text $T$ in $O(n + \tilde e_T \log ��)$ time.

12 pages, 2 figures

Related Organizations

Keywords

FOS: Computer and information sciences, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS)

7 Research products, page 1 of 1

On-line construction of compact directed acyclic word graphs
2001IsAmongTopNSimilarDocuments
Compact directed acyclic word graphs for a sliding window
2002IsAmongTopNSimilarDocuments
Kompaktní sufixový automat v posuvném okně
2011IsAmongTopNSimilarDocuments
Suffix Trees, DAWGs and CDAWGs for Forward and Backward Tries
2020IsAmongTopNSimilarDocuments
Composite Repetition-Aware Data Structures
2015IsAmongTopNSimilarDocuments
Sufixové grafy a bezeztrátová komprese dat
2013IsAmongTopNSimilarDocuments
Universal compressed text indexing
2019IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	11
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%