
arXiv: 1912.11944
handle: 10278/3729796 , 11385/192326
This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work [5]. In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections), that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package.
This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. Replication framework available at: https://github.com/migumar2/uiHRDC/
Self-index, Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Repetitive document collections, Repetitive document collections; Inverted index; Self-index; Reproducibility, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Inverted index, Repetitive document collections, Inverted index, Self-index, Reproducibility, Reproducibility
Self-index, Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Repetitive document collections, Repetitive document collections; Inverted index; Self-index; Reproducibility, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Inverted index, Repetitive document collections, Inverted index, Self-index, Reproducibility, Reproducibility
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
