publication . Preprint . 2017

Accelerated Hierarchical Density Clustering

McInnes, Leland; Healy, John;
Open Access English
  • Published: 20 May 2017
Abstract
We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://github.com/scikit-learn-contrib/hdbscan
Subjects
free text keywords: Statistics - Machine Learning
Download from
61 references, page 1 of 5

[1] Ackermann, W. Zum hilbertschen aufbau der reellen zahlen. Mathematische Annalen 99, 1 (1928), 118{133. [OpenAIRE]

[2] Bentley, J. L. Multidimensional binary search trees used for associative searching. Communications of the ACM 18, 9 (1975), 509{517.

[3] Beygelzimer, A., Kakade, S., and Langford, J. Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning (2006), ACM, pp. 97{104. [OpenAIRE]

[4] Boruvka, O. O jistem problemu minimaln m.

[5] Bredon, G. E. Sheaf theory, vol. 170. Springer Science & Business Media, 2012.

[6] Campello, R. J., Moulavi, D., and Sander, J. Density-based clustering based on hierarchical density estimates. In Paci c-Asia Conference on Knowledge Discovery and Data Mining (2013), Springer, pp. 160{172. [OpenAIRE]

[7] Campello, R. J., Moulavi, D., Zimek, A., and Sander, J. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 1 (2015), 5.

[8] Carlsson, G. Topology and data. Bulletin of the American Mathematical Society 46, 2 (2009), 255{308.

[9] Carlsson, G., and Memoli, F. Multiparameter hierarchical clustering methods. In Classi cation as a Tool for Research. Springer, 2010, pp. 63{ 70. [OpenAIRE]

[10] Carlsson, G., and Memoli, F. Classifying clustering schemes. Foundations of Computational Mathematics 13, 2 (2013), 221{252.

[11] Chaudhuri, K., and Dasgupta, S. Rates of convergence for the cluster tree. In Proceedings of the 23rd International Conference on Neural Information Processing Systems (USA, 2010), NIPS'10, Curran Associates Inc., pp. 343{351.

[12] Chaudhuri, K., Dasgupta, S., Kpotufe, S., and von Luxburg, U. Consistent procedures for cluster tree estimation and pruning. IEEE Transactions on Information Theory 60, 12 (2014), 7900{7912.

[13] Chazal, F., Fasy, B. T., Lecci, F., Michel, B., Rinaldo, A., and Wasserman, L. Robust topological inference: Distance to a measure and kernel distance. arXiv preprint arXiv:1412.7197 (2014). [OpenAIRE]

[14] Curtin, R. R. Faster dual-tree traversal for nearest neighbor search. In International Conference on Similarity Search and Applications (2015), Springer, pp. 77{89.

[15] Curtin, R. R., March, W. B., Ram, P., Anderson, D. V., Gray, A. G., and Isbell Jr, C. L. Tree-independent dual-tree algorithms. arXiv preprint arXiv:1304.4327 (2013).

61 references, page 1 of 5
Abstract
We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://github.com/scikit-learn-contrib/hdbscan
Subjects
free text keywords: Statistics - Machine Learning
Download from
61 references, page 1 of 5

[1] Ackermann, W. Zum hilbertschen aufbau der reellen zahlen. Mathematische Annalen 99, 1 (1928), 118{133. [OpenAIRE]

[2] Bentley, J. L. Multidimensional binary search trees used for associative searching. Communications of the ACM 18, 9 (1975), 509{517.

[3] Beygelzimer, A., Kakade, S., and Langford, J. Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning (2006), ACM, pp. 97{104. [OpenAIRE]

[4] Boruvka, O. O jistem problemu minimaln m.

[5] Bredon, G. E. Sheaf theory, vol. 170. Springer Science & Business Media, 2012.

[6] Campello, R. J., Moulavi, D., and Sander, J. Density-based clustering based on hierarchical density estimates. In Paci c-Asia Conference on Knowledge Discovery and Data Mining (2013), Springer, pp. 160{172. [OpenAIRE]

[7] Campello, R. J., Moulavi, D., Zimek, A., and Sander, J. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 1 (2015), 5.

[8] Carlsson, G. Topology and data. Bulletin of the American Mathematical Society 46, 2 (2009), 255{308.

[9] Carlsson, G., and Memoli, F. Multiparameter hierarchical clustering methods. In Classi cation as a Tool for Research. Springer, 2010, pp. 63{ 70. [OpenAIRE]

[10] Carlsson, G., and Memoli, F. Classifying clustering schemes. Foundations of Computational Mathematics 13, 2 (2013), 221{252.

[11] Chaudhuri, K., and Dasgupta, S. Rates of convergence for the cluster tree. In Proceedings of the 23rd International Conference on Neural Information Processing Systems (USA, 2010), NIPS'10, Curran Associates Inc., pp. 343{351.

[12] Chaudhuri, K., Dasgupta, S., Kpotufe, S., and von Luxburg, U. Consistent procedures for cluster tree estimation and pruning. IEEE Transactions on Information Theory 60, 12 (2014), 7900{7912.

[13] Chazal, F., Fasy, B. T., Lecci, F., Michel, B., Rinaldo, A., and Wasserman, L. Robust topological inference: Distance to a measure and kernel distance. arXiv preprint arXiv:1412.7197 (2014). [OpenAIRE]

[14] Curtin, R. R. Faster dual-tree traversal for nearest neighbor search. In International Conference on Similarity Search and Applications (2015), Springer, pp. 77{89.

[15] Curtin, R. R., March, W. B., Ram, P., Anderson, D. V., Gray, A. G., and Isbell Jr, C. L. Tree-independent dual-tree algorithms. arXiv preprint arXiv:1304.4327 (2013).

61 references, page 1 of 5
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue