Sequential Discrete Hashing for Scalable Cross-modality Similarity Retrieval

Article English OPEN
Liu, Li ; Lin, Zijia ; Shao, Ling ; Shen, Fumin ; Ding, Guiguang ; Han, Jungong (2017)

With the dramatic development of the Internet, how to exploit large-scale retrieval techniques for multimodal web data has become one of the most popular but challenging problems in computer vision and multimedia. Recently, hashing methods are used for fast nearest neighbor search in large-scale data spaces, by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. Inspired by this, in this paper, we introduce a novel supervised cross-modality hashing framework, which can generate unified binary codes for instances represented in different modalities. Particularly, in the learning phase, each bit of a code can be sequentially learned with a discrete optimization scheme that jointly minimizes its empirical loss based on a boosting strategy. In a bitwise manner, hash functions are then learned for each modality, mapping the corresponding representations into unified hash codes. We regard this approach as cross-modality sequential discrete hashing (CSDH), which can effectively reduce the quantization errors arisen in the oversimplified rounding-off step and thus lead to high-quality binary codes. In the test phase, a simple fusion scheme is utilized to generate a unified hash code for final retrieval by merging the predicted hashing results of an unseen instance from different modalities. The proposed CSDH has been systematically evaluated on three standard data sets: Wiki, MIRFlickr, and NUS-WIDE, and the results show that our method significantly outperforms the state-of-the-art multimodality hashing techniques.
  • References (55)
    55 references, page 1 of 6

    [1] D. Wang, X. Gao, X. Wang, and L. He, “Semantic topic multimodal hashing for cross-media retrieval,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2015, pp. 3890-3896.

    [2] J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo, “Effective multiple feature hashing for large-scale near-duplicate video retrieval,” IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1997-2008, 2013.

    [3] M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios, “Data fusion through cross-modality metric learning using similarity-sensitive hashing,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3594-3601.

    [4] S. Kumar and R. Udupa, “Learning hash functions for cross-view similarity search,” in International Joint Conference on Artificial Intelligence, 2011, pp. 1360-1365.

    [5] L. Liu, M. Yu, and L. Shao, “Multiview alignment hashing for efficient image search,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 956-966.

    [6] Y. Zhen and D.-Y. Yeung, “A probabilistic model for multimodal hash function learning,” in SIGKDD, 2012, pp. 940-948.

    [7] --, “Co-regularized hashing for multimodal data,” in Advances in Neural Information Processing Systems, 2012, pp. 1376-1384.

    [8] X. Zhu, Z. Huang, H. T. Shen, and X. Zhao, “Linear cross-modal hashing for efficient multimedia search,” in International Conference on Multimedia, 2013, pp. 143-152.

    [9] J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen, “Inter-media hashing for large-scale retrieval from heterogeneous data sources,” in ACM SIGMOD International Conference on Management of Data, 2013, pp. 785-796.

    [10] X. Liu, C. Deng, B. Lang, D. Tao, and X. Li, “Query-adaptive reciprocal hash tables for nearest neighbor search,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 907-919, 2016.

  • Metrics
    No metrics available
Share - Bookmark