publication . Article . Contribution for newspaper or weekly magazine . Part of book or chapter of book . Preprint . 2018

ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms

Aumüller, Martin; Bernhardsson, Erik; Faithfull, Alexander;
Open Access English
  • Published: 15 Jul 2018
  • Country: Denmark
Abstract
This paper describes ANN-Benchmarks, a tool for evaluating the performance of in-memory approximate nearest neighbor algorithms. It provides a standard interface for measuring the performance and quality achieved by nearest neighbor algorithms on different standard data sets. It supports several different ways of integrating $k$-NN algorithms, and its configuration system automatically tests a range of parameter settings for each algorithm. Algorithms are compared with respect to many different (approximate) quality measures, and adding more is easy and fast; the included plotting front-ends can visualise these as images, $\LaTeX$ plots, and websites with intera...
Subjects
free text keywords: Hardware and Architecture, Software, Information Systems, Computer Science - Information Retrieval, Computer Science - Databases, H.3.3
Funded by
EC| SSS
Project
SSS
Scalable Similarity Search
  • Funder: European Commission (EC)
  • Project Code: 614331
  • Funding stream: FP7 | SP2 | ERC

MRPT - fast nearest neighbor search with random projection, https://github.com/ teemupitkanen/mrpt NGT: PANNG, https://github.com/yahoojapan/NGT Ahle, T.D., Aumüller, M., Pagh, R.: Parameter-free locality sensitive hashing for spherical range reporting. In: SODA'17. pp. 239-256 Alman, J., Williams, R.: Probabilistic polynomials and hamming nearest neighbors. In: FOCS'15.

pp. 136-150 Andoni, A., Indyk, P., Laarhoven, T., Razenshteyn, I.P., Schmidt, L.: Practical and optimal LSH for angular distance. In: NIPS'15. pp. 1225-1233. https://falconn-lib.org/ Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509-517 (1975) Bernhardsson, E.: Annoy, https://github.com/spotify/annoy Boytsov, L., Naidan, B.: Engineering e cient and e ective non-metric space library. In: SISAP'13.

pp. 280-293 Boytsov, L., Novak, D., Malkov, Y., Nyberg, E.: O the beaten path: Let's replace term-based retrieval with k-nn search. In: CIKM'16. pp. 1099-1108 Christiani, T., Pagh, R., Sivertsen, J.: Scalable and robust set similarity join. In: ICDE'2018 (2018) Ciaccia, P., Patella, M., Zezula, P.: M-tree: An e cient access method for similarity search in metric spaces. In: VLDB'97. pp. 426-435 (1997) Curtin, R.R., Cline, J.R., Slagle, N.P., March, W.B., Ram, P., Mehta, N.A., Gray, A.G.: MLPACK: A scalable C++ machine learning library. Journal of Machine Learning Research 14, 801-805 (2013) Dong, W.: KGraph, https://github.com/aaalgo/kgraph Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning.

In: CIKM'08. pp. 669-678. ACM, http://lshkit.sourceforge.net/ Edel, M., Soni, A., Curtin, R.R.: An automatic benchmarking system. In: NIPS 2014 Workshop on Software Engineering for Machine Learning (2014) Heo, J.P., Lee, Y., He, J., Chang, S.F., Yoon, S.E.: Spherical hashing: Binary code embedding with hyperspheres. IEEE TPAMI 37(11), 2304-2316 (2015) Hyvönen, V., Pitkänen, T., Tasoulis, S., Jääsaari, E., Tuomainen, R., Wang, L., Corander, J., Roos, T.: Fast nearest neighbor search through sparse random projections and voting. In: Big Data (Big Data), 2016 IEEE International Conference on. pp. 881-888. IEEE (2016) Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC'98. pp. 604-613 Iwasaki, M.: Pruned bi-directed k-nearest neighbor graph for proximity search. In: SISAP 2016.

pp. 20-33 (2016), https://doi.org/10.1007/978-3-319-46759-7_2 Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with gpus. CoRR abs/1702.08734 (2017) Johnson, W.B., Lindenstrauss, J., Schechtman, G.: Extensions of lipschitz maps into banach spaces.

Israel Journal of Mathematics 54(2), 129-138 (1986) Kriegel, H., Schubert, E., Zimek, A.: The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowl. Inf. Syst. 52(2), 341-378 (2017) Laarhoven, T.: Graph-Based Time-Space Trade-O s for Approximate Near Neighbors. In: Speckmann, B., Tóth, C.D. (eds.) 34th International Symposium on Computational Geometry (SoCG 2018). Leibniz International Proceedings in Informatics (LIPIcs), vol. 99, pp. 57:1-57:14.

Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018), http://drops.

dagstuhl.de/opus/volltexte/2018/8770 Lam, M.: Word2bits - quantized word vectors. CoRR abs/1803.05651 (2018), http://arxiv.org/ abs/1803.05651 LeCun, Y., Bottou, L., Bengio, Y., Ha ner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278-2324 (1998) Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR abs/1610.02455 (2016), http://arxiv.org/abs/1610.02455 Lyst Engineering: Rpforest, https://github.com/lyst/rpforest Malkov, Y.A., Yashunin, D.A.: E cient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. ArXiv e-prints (Mar 2016) Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst. 45, 61-68 (2014) McInnes, L.: PyNNDescent, https://github.com/lmcinnes/pynndescent Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS'13. pp. 3111-3119 Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm con guration.

In: VISSAPP'09. pp. 331-340. INSTICC Press Norouzi, M., Punjani, A., Fleet, D.J.: Fast search in hamming space with multi-index hashing. In: CVPR'12. pp. 3108-3115. IEEE Pham, N.: Hybrid LSH: faster near neighbors reporting in high-dimensional space. In: EDBT'17.

pp. 454-457 Van Rijn, J.N., Bischl, B., Torgo, L., Gao, B., Umaashankar, V., Fischer, S., Winter, P., Wiswedel, B., Berthold, M.R., Vanschoren, J.: Openml: A collaborative science platform. In: ECML PKDD. pp.

645-649. Springer (2013) Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: A survey. CoRR abs/1408.2927 (2014), http://arxiv.org/abs/1408.2927 Wang, Y., Shrivastava, A., Wang, J., Ryu, J.: Randomized algorithms accelerated over CPU-GPU for ultra-high dimensional similarity search. In: SIGMOD. pp. 889-903 (2018), http://doi.acm.

org/10.1145/3183713.3196925 Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor.

Comput. Sci. 348(2-3), 357-365 (2005) Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-Trees. VLDB J. 7(4), 275-293 (1998)

Abstract
This paper describes ANN-Benchmarks, a tool for evaluating the performance of in-memory approximate nearest neighbor algorithms. It provides a standard interface for measuring the performance and quality achieved by nearest neighbor algorithms on different standard data sets. It supports several different ways of integrating $k$-NN algorithms, and its configuration system automatically tests a range of parameter settings for each algorithm. Algorithms are compared with respect to many different (approximate) quality measures, and adding more is easy and fast; the included plotting front-ends can visualise these as images, $\LaTeX$ plots, and websites with intera...
Subjects
free text keywords: Hardware and Architecture, Software, Information Systems, Computer Science - Information Retrieval, Computer Science - Databases, H.3.3
Funded by
EC| SSS
Project
SSS
Scalable Similarity Search
  • Funder: European Commission (EC)
  • Project Code: 614331
  • Funding stream: FP7 | SP2 | ERC

MRPT - fast nearest neighbor search with random projection, https://github.com/ teemupitkanen/mrpt NGT: PANNG, https://github.com/yahoojapan/NGT Ahle, T.D., Aumüller, M., Pagh, R.: Parameter-free locality sensitive hashing for spherical range reporting. In: SODA'17. pp. 239-256 Alman, J., Williams, R.: Probabilistic polynomials and hamming nearest neighbors. In: FOCS'15.

pp. 136-150 Andoni, A., Indyk, P., Laarhoven, T., Razenshteyn, I.P., Schmidt, L.: Practical and optimal LSH for angular distance. In: NIPS'15. pp. 1225-1233. https://falconn-lib.org/ Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509-517 (1975) Bernhardsson, E.: Annoy, https://github.com/spotify/annoy Boytsov, L., Naidan, B.: Engineering e cient and e ective non-metric space library. In: SISAP'13.

pp. 280-293 Boytsov, L., Novak, D., Malkov, Y., Nyberg, E.: O the beaten path: Let's replace term-based retrieval with k-nn search. In: CIKM'16. pp. 1099-1108 Christiani, T., Pagh, R., Sivertsen, J.: Scalable and robust set similarity join. In: ICDE'2018 (2018) Ciaccia, P., Patella, M., Zezula, P.: M-tree: An e cient access method for similarity search in metric spaces. In: VLDB'97. pp. 426-435 (1997) Curtin, R.R., Cline, J.R., Slagle, N.P., March, W.B., Ram, P., Mehta, N.A., Gray, A.G.: MLPACK: A scalable C++ machine learning library. Journal of Machine Learning Research 14, 801-805 (2013) Dong, W.: KGraph, https://github.com/aaalgo/kgraph Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning.

In: CIKM'08. pp. 669-678. ACM, http://lshkit.sourceforge.net/ Edel, M., Soni, A., Curtin, R.R.: An automatic benchmarking system. In: NIPS 2014 Workshop on Software Engineering for Machine Learning (2014) Heo, J.P., Lee, Y., He, J., Chang, S.F., Yoon, S.E.: Spherical hashing: Binary code embedding with hyperspheres. IEEE TPAMI 37(11), 2304-2316 (2015) Hyvönen, V., Pitkänen, T., Tasoulis, S., Jääsaari, E., Tuomainen, R., Wang, L., Corander, J., Roos, T.: Fast nearest neighbor search through sparse random projections and voting. In: Big Data (Big Data), 2016 IEEE International Conference on. pp. 881-888. IEEE (2016) Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC'98. pp. 604-613 Iwasaki, M.: Pruned bi-directed k-nearest neighbor graph for proximity search. In: SISAP 2016.

pp. 20-33 (2016), https://doi.org/10.1007/978-3-319-46759-7_2 Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with gpus. CoRR abs/1702.08734 (2017) Johnson, W.B., Lindenstrauss, J., Schechtman, G.: Extensions of lipschitz maps into banach spaces.

Israel Journal of Mathematics 54(2), 129-138 (1986) Kriegel, H., Schubert, E., Zimek, A.: The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowl. Inf. Syst. 52(2), 341-378 (2017) Laarhoven, T.: Graph-Based Time-Space Trade-O s for Approximate Near Neighbors. In: Speckmann, B., Tóth, C.D. (eds.) 34th International Symposium on Computational Geometry (SoCG 2018). Leibniz International Proceedings in Informatics (LIPIcs), vol. 99, pp. 57:1-57:14.

Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018), http://drops.

dagstuhl.de/opus/volltexte/2018/8770 Lam, M.: Word2bits - quantized word vectors. CoRR abs/1803.05651 (2018), http://arxiv.org/ abs/1803.05651 LeCun, Y., Bottou, L., Bengio, Y., Ha ner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278-2324 (1998) Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR abs/1610.02455 (2016), http://arxiv.org/abs/1610.02455 Lyst Engineering: Rpforest, https://github.com/lyst/rpforest Malkov, Y.A., Yashunin, D.A.: E cient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. ArXiv e-prints (Mar 2016) Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst. 45, 61-68 (2014) McInnes, L.: PyNNDescent, https://github.com/lmcinnes/pynndescent Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS'13. pp. 3111-3119 Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm con guration.

In: VISSAPP'09. pp. 331-340. INSTICC Press Norouzi, M., Punjani, A., Fleet, D.J.: Fast search in hamming space with multi-index hashing. In: CVPR'12. pp. 3108-3115. IEEE Pham, N.: Hybrid LSH: faster near neighbors reporting in high-dimensional space. In: EDBT'17.

pp. 454-457 Van Rijn, J.N., Bischl, B., Torgo, L., Gao, B., Umaashankar, V., Fischer, S., Winter, P., Wiswedel, B., Berthold, M.R., Vanschoren, J.: Openml: A collaborative science platform. In: ECML PKDD. pp.

645-649. Springer (2013) Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: A survey. CoRR abs/1408.2927 (2014), http://arxiv.org/abs/1408.2927 Wang, Y., Shrivastava, A., Wang, J., Ryu, J.: Randomized algorithms accelerated over CPU-GPU for ultra-high dimensional similarity search. In: SIGMOD. pp. 889-903 (2018), http://doi.acm.

org/10.1145/3183713.3196925 Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor.

Comput. Sci. 348(2-3), 357-365 (2005) Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-Trees. VLDB J. 7(4), 275-293 (1998)

Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue