
arXiv: 1602.02620
Locality-sensitive hashing (LSH) has emerged as the dominant algorithmic technique for similarity search with strong performance guarantees in high-dimensional spaces. A drawback of traditional LSH schemes is that they may have \emph{false negatives}, i.e., the recall is less than 100\%. This limits the applicability of LSH in settings requiring precise performance guarantees. Building on the recent theoretical "CoveringLSH" construction that eliminates false negatives, we propose a fast and practical covering LSH scheme for Hamming space called \emph{Fast CoveringLSH (fcLSH)}. Inheriting the design benefits of CoveringLSH our method avoids false negatives and always reports all near neighbors. Compared to CoveringLSH we achieve an asymptotic improvement to the hash function computation time from $\mathcal{O}(dL)$ to $\mathcal{O}(d + L\log{L})$, where $d$ is the dimensionality of data and $L$ is the number of hash tables. Our experiments on synthetic and real-world data sets demonstrate that \emph{fcLSH} is comparable (and often superior) to traditional hashing-based approaches for search radius up to 20 in high-dimensional Hamming space.
Short version appears in Proceedings of CIKM 2016
Locality-sensitive hashing, FOS: Computer and information sciences, Hamming space, Hash function computation, Databases (cs.DB), High-dimensional similarity search, 004, Computer Science - Information Retrieval, Computer Science - Databases, Computer Science - Data Structures and Algorithms, False negatives, Data Structures and Algorithms (cs.DS), Information Retrieval (cs.IR)
Locality-sensitive hashing, FOS: Computer and information sciences, Hamming space, Hash function computation, Databases (cs.DB), High-dimensional similarity search, 004, Computer Science - Information Retrieval, Computer Science - Databases, Computer Science - Data Structures and Algorithms, False negatives, Data Structures and Algorithms (cs.DS), Information Retrieval (cs.IR)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
