publication . Conference object . 2017

A Progressive k-d tree for Approximate k-Nearest Neighbors

Jo, Jaemin; Seo, Jinwook; Fekete, Jean-Daniel;
Open Access English
  • Published: 02 Oct 2017
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; We present a progressive algorithm for approximate k-nearest neighbor search. Although the use of k-nearest neighbor libraries (KNN) is common in many data analysis methods, most KNN algorithms can only be run when the whole dataset has been indexed, i.e., they are not online. Even the few online implementations are not progressive in the sense that the time to index incoming data is not bounded and can exceed the latency required by progressive systems. Exceeding this latency significantly impacts the interactivity of visualization systems especially when dealing with large-scale data. We improve traditional k-d trees for progressive app...
Subjects
free text keywords: Approximate k-Nearest-Neighbors, Progressive Data Analysis, Algorithm, Real-Time, ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.3: Information Search and Retrieval, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
32 references, page 1 of 3

[1] S. Albers. Online algorithms: a survey. Mathematical Programming, 97(1):3-26, 2003. doi: 10.1007/s10107-003-0436-0 1

[2] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pp. 459-468. IEEE, 2006. doi: 10.1145/1327452.1327494 2 [OpenAIRE]

[3] M. Bawa, T. Condie, and P. Ganesan. LSH forest: self-tuning indexes for similarity search. In Proceedings of the 14th international conference on World Wide Web, pp. 651-660. ACM, 2005. doi: 10. 1145/1060745.1060840 2 [OpenAIRE]

[4] J. S. Beis and D. G. Lowe. Shape indexing using approximate nearestneighbour search in high-dimensional spaces. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on, pp. 1000-1006. IEEE, 1997. doi: 10.1109/CVPR.1997. 609451 1

[5] J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509-517, 1975. doi: 10 .1145/361002.361007 1, 2

[6] E. Bernhardsson. Annoy. https://github.com/spotify/annoy. Last accessed: 2017-07-22. 2

[7] E. Bernhardsson. Benchmarking nearest neighbors. https://github. com/erikbern/ann-benchmarks. Last accessed: 2017-07-22. 1, 2

[8] S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pp. 537-546. ACM, 2008. doi: 10 .1145/1374376.1374452 2

[9] J.-D. Fekete and R. Primet. Progressive analytics: A computation paradigm for exploratory data analysis. ArXiv e-prints, July 2016. 1, 4

[10] K. Fukunaga. Introduction to statistical pattern recognition. Academic press, 2013. 1

[11] K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, p. 1312, 2011. doi: 10.5591/978-1-57735-516-8/IJCAI11-222 2

[12] Y. Jia, J. Wang, G. Zeng, H. Zha, and X.-S. Hua. Optimizing kd-trees for scalable visual descriptor indexing. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3392-3399. IEEE, 2010. doi: 10.1109/CVPR.2010.5540006 2

[13] B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In Computer Vision, 2009 IEEE 12th International Conference on, pp. 2130-2137. IEEE, 2009. doi: 10.1109/ICCV .2009.5459466 2 [OpenAIRE]

[14] B. Leibe, K. Mikolajczyk, and B. Schiele. Efficient clustering and matching for object class recognition. In Proceedings of the British Machine Vision Conference, pp. 789-798. BMVA Press, 2006. doi: 10. 5244/C.20.81 2 [OpenAIRE]

[15] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, pp. 950-961. VLDB Endowment, 2007. 2

32 references, page 1 of 3
Abstract
International audience; We present a progressive algorithm for approximate k-nearest neighbor search. Although the use of k-nearest neighbor libraries (KNN) is common in many data analysis methods, most KNN algorithms can only be run when the whole dataset has been indexed, i.e., they are not online. Even the few online implementations are not progressive in the sense that the time to index incoming data is not bounded and can exceed the latency required by progressive systems. Exceeding this latency significantly impacts the interactivity of visualization systems especially when dealing with large-scale data. We improve traditional k-d trees for progressive app...
Subjects
free text keywords: Approximate k-Nearest-Neighbors, Progressive Data Analysis, Algorithm, Real-Time, ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.3: Information Search and Retrieval, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
32 references, page 1 of 3

[1] S. Albers. Online algorithms: a survey. Mathematical Programming, 97(1):3-26, 2003. doi: 10.1007/s10107-003-0436-0 1

[2] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pp. 459-468. IEEE, 2006. doi: 10.1145/1327452.1327494 2 [OpenAIRE]

[3] M. Bawa, T. Condie, and P. Ganesan. LSH forest: self-tuning indexes for similarity search. In Proceedings of the 14th international conference on World Wide Web, pp. 651-660. ACM, 2005. doi: 10. 1145/1060745.1060840 2 [OpenAIRE]

[4] J. S. Beis and D. G. Lowe. Shape indexing using approximate nearestneighbour search in high-dimensional spaces. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on, pp. 1000-1006. IEEE, 1997. doi: 10.1109/CVPR.1997. 609451 1

[5] J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509-517, 1975. doi: 10 .1145/361002.361007 1, 2

[6] E. Bernhardsson. Annoy. https://github.com/spotify/annoy. Last accessed: 2017-07-22. 2

[7] E. Bernhardsson. Benchmarking nearest neighbors. https://github. com/erikbern/ann-benchmarks. Last accessed: 2017-07-22. 1, 2

[8] S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pp. 537-546. ACM, 2008. doi: 10 .1145/1374376.1374452 2

[9] J.-D. Fekete and R. Primet. Progressive analytics: A computation paradigm for exploratory data analysis. ArXiv e-prints, July 2016. 1, 4

[10] K. Fukunaga. Introduction to statistical pattern recognition. Academic press, 2013. 1

[11] K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, p. 1312, 2011. doi: 10.5591/978-1-57735-516-8/IJCAI11-222 2

[12] Y. Jia, J. Wang, G. Zeng, H. Zha, and X.-S. Hua. Optimizing kd-trees for scalable visual descriptor indexing. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3392-3399. IEEE, 2010. doi: 10.1109/CVPR.2010.5540006 2

[13] B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In Computer Vision, 2009 IEEE 12th International Conference on, pp. 2130-2137. IEEE, 2009. doi: 10.1109/ICCV .2009.5459466 2 [OpenAIRE]

[14] B. Leibe, K. Mikolajczyk, and B. Schiele. Efficient clustering and matching for object class recognition. In Proceedings of the British Machine Vision Conference, pp. 789-798. BMVA Press, 2006. doi: 10. 5244/C.20.81 2 [OpenAIRE]

[15] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, pp. 950-961. VLDB Endowment, 2007. 2

32 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue