publication . Preprint . 2013

Clustering for high-dimension, low-sample size data using distance vectors

Terada, Yoshikazu;
Open Access English
  • Published: 11 Dec 2013
Abstract
In high-dimension, low-sample size (HDLSS) data, it is not always true that closeness of two objects reflects a hidden cluster structure. We point out the important fact that it is not the closeness, but the "values" of distance that contain information of the cluster structure in high-dimensional space. Based on this fact, we propose an efficient and simple clustering approach, called distance vector clustering, for HDLSS data. Under the assumptions given in the work of Hall et al. (2005), we show the proposed approach provides a true cluster label under milder conditions when the dimension tends to infinity with the sample size fixed. The effectiveness of the ...
Subjects
free text keywords: Statistics - Machine Learning, Computer Science - Learning
Related Organizations
Download from

[1] Ahn, J., Lee, M. H., and Yoon, Y. J. (2013). Clustering high dimension, low sample size data using the maximal data piling distance. Statist. Sinica. 22 443-464.

[2] Ahn, J., Marron, J. S., Muller, K. M., and Chi, Y.-Y. (2007). The high-dimension, low-sample-size geometric representation holds under mild condition. Biometrika. 94 760-766.

[3] Borysov, P., Hannig, J., and Marron, J. S. (2013). Asymptotics of hierarchical clustering for growing dimension. to appear in J. Multivar. Anal. [OpenAIRE]

[4] Dettling, M. (2004). BagBoosting for tumor classification with gene expression data. Bioinformatics. 20 3583-3593. [OpenAIRE]

[5] Hall, P., Marron, J. S., and Neeman, A. (2005). Geometric representation of high dimension, low sample size data. J. R. Statist. Soc. B. 67 427-444. [OpenAIRE]

[6] Liu, L., Hayes, D. N., Nobel, A., and Marron. J. S. (2008). Statistical significance of clustering for high-dimensional, low-sample size data. J. Amer. Statist. Assoc. 58 236-244.

[7] Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering. J. Amer. Statist. Assoc. 105 713-726.

Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue