
doi: 10.1002/sam.11211
AbstractThere has been growing interest in kernel methods for classification, clustering and dimension reduction. For example, kernel Fisher discriminant analysis, spectral clustering and kernel principal component analysis are widely used in statistical learning and data mining applications. The empirical success of the kernel method is generally attributed to nonlinear feature mapping induced by the kernel, which in turn determines a low dimensional data embedding. It is important to understand the effect of a kernel and its associated kernel parameter(s) on the embedding in relation to data distributions. In this paper, we examine the geometry of the nonlinear embedding for kernel principal component analysis (PCA) when polynomial kernels are used. We carry out eigen‐analysis of the polynomial kernel operator associated with data distributions and investigate the effect of the degree of polynomial. The results provide both insights into the geometry of nonlinear data embedding and practical guidelines for choosing an appropriate degree for dimension reduction with polynomial kernels. We further comment on the effect of centering kernels on the spectral property of the polynomial kernel operator. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013
kernel methods, Classification and discrimination; cluster analysis (statistical aspects), Gaussian kernel, kernel PCA, Factor analysis and principal components; correspondence analysis, nonlinear embedding
kernel methods, Classification and discrimination; cluster analysis (statistical aspects), Gaussian kernel, kernel PCA, Factor analysis and principal components; correspondence analysis, nonlinear embedding
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 11 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
