Neighbor-sensitive hashing

descriptionPublicationkeyboard_double_arrow_right Article 01 Nov 2015 English Publisher:Association for Computing Machinery (ACM)Journal:Proceedings of the VLDB Endowment, volume 9, pages 144-155 (issn: 2150-8097,

Authors: Yongjoo Park; Michael Cafarella; Barzan Mozafari;

doi: 10.14778/2850583.2850589

Neighbor-sensitive hashing

- Summary
- Related research
  (9)
- Metrics

Abstract

Approximate k NN ( k -nearest neighbor) techniques using binary hash functions are among the most commonly used approaches for overcoming the prohibitive cost of performing exact k NN queries. However, the success of these techniques largely depends on their hash functions' ability to distinguish k NN items; that is, the k NN items retrieved based on data items' hashcodes , should include as many true k NN items as possible. A widely-adopted principle for this process is to ensure that similar items are assigned to the same hashcode so that the items with the hashcodes similar to a query's hashcode are likely to be true neighbors. In this work, we abandon this heavily-utilized principle and pursue the opposite direction for generating more effective hash functions for k NN tasks. That is, we aim to increase the distance between similar items in the hashcode space, instead of reducing it. Our contribution begins by providing theoretical analysis on why this revolutionary and seemingly counter-intuitive approach leads to a more accurate identification of k NN items. Our analysis is followed by a proposal for a hashing algorithm that embeds this novel principle. Our empirical studies confirm that a hashing algorithm based on this counter-intuitive idea significantly improves the efficiency and accuracy of state-of-the-art techniques.

Related Organizations

University of Michigan–Ann Arbor
United States
University of Michigan–Flint
United States

9 Research products, page 1 of 1

Modeling Dialogues with Hashcode Representations: A Nonparametric Approach
2020IsAmongTopNSimilarDocuments
Large-Scale Semantic Data Management For Urban Computing Applications
2019IsAmongTopNSimilarDocuments
Semantic Cluster Unary Loss for Efficient Deep Hashing
2019IsAmongTopNSimilarDocuments
Hey! Are you injecting side effect?: A tool for detecting purity changes in java methods
2016IsAmongTopNSimilarDocuments
Case study
2004IsAmongTopNSimilarDocuments
Learning to hash for large scale image retrieval
2016IsAmongTopNSimilarDocuments
A Study on Fingerprint Hash Code Generation Using Euclidean Distance for Identifying a User
2017IsAmongTopNSimilarDocuments
Towards efficient similarity search with semantic hashing techniques
2021IsAmongTopNSimilarDocuments
Software Security analysis, static and dynamic testing in java and C environment, a comparative study
2012IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	23
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%