publication . Preprint . 2017

Improved Fixed-Rank Nystr\"om Approximation via QR Decomposition: Practical and Theoretical Aspects

Pourkamali-Anaraki, Farhad; Becker, Stephen;
Open Access English
  • Published: 08 Aug 2017
Abstract
The Nystrom method is a popular technique that uses a small number of landmark points to compute a fixed-rank approximation of large kernel matrices that arise in machine learning problems. In practice, to ensure high quality approximations, the number of landmark points is chosen to be greater than the target rank. However, for simplicity the standard Nystrom method uses a sub-optimal procedure for rank reduction. In this paper, we examine the drawbacks of the standard Nystrom method in terms of poor performance and lack of theoretical guarantees. To address these issues, we present an efficient modification for generating improved fixed-rank Nystrom approximat...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning, Computer Science - Machine Learning
Download from
39 references, page 1 of 3

[1] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.

[2] J. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Processing Letters, vol. 9, no. 3, pp. 293-300, 1999. [OpenAIRE]

[3] B. Schölkopf, A. Smola, and K. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1998.

[4] C. Alaíz, M. Fanuel, and J. Suykens, “Convex formulation for kernel PCA and its use in semisupervised learning,” IEEE Transactions on Neural Networks and Learning Systems, 2017.

[5] M. Girolami, “Mercer kernel-based clustering in feature space,” IEEE Transactions on Neural Networks, vol. 13, no. 3, pp. 780-784, 2002.

[6] R. Chitta, R. Jin, T. Havens, and A. Jain, “Approximate kernel k-means: Solution to large scale kernel clustering,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 895-903. [OpenAIRE]

[7] F. Pourkamali-Anaraki and S. Becker, “A randomized approach to efficient kernel clustering,” in IEEE Global Conference on Signal and Information Processing, 2016, pp. 207-211. [OpenAIRE]

[8] R. Langone and J. Suykens, “Fast kernel spectral clustering,” Neurocomputing, 2017. [OpenAIRE]

[9] C. Saunders, A. Gammerman, and V. Vovk, “Ridge regression learning algorithm in dual variables,” in International Conference on Machine Learning, 1998, pp. 515-521. [OpenAIRE]

[10] A. Alaoui and M. Mahoney, “Fast randomized kernel ridge regression with statistical guarantees,” in Advances in Neural Information Processing Systems (NIPS), 2015, pp. 775-783.

[11] B. Schölkopf and A. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2001.

[12] F. Bach, “Sharp analysis of low-rank kernel matrix approximations,” in Conference on Learning Theory, 2013, pp. 185-209.

[13] R. Langone, V. Jumutc, and J. Suykens, “Large-scale clustering algorithms,” in Data Science and Big Data: An Environment of Computational Intelligence. Springer, 2017, pp. 3-28.

[14] C. Cortes, M. Mohri, and A. Talwalkar, “On the impact of kernel approximation on learning accuracy,” in International Conference on Artificial Intelligence and Statistics, 2010, pp. 113-120.

[15] K. Zhang, L. Lan, Z. Wang, and F. Moerchen, “Scaling up kernel SVM on limited resources: A low-rank linearization approach,” in International Conference on Artificial Intelligence and Statistics, 2012, pp. 1425-1434.

39 references, page 1 of 3
Abstract
The Nystrom method is a popular technique that uses a small number of landmark points to compute a fixed-rank approximation of large kernel matrices that arise in machine learning problems. In practice, to ensure high quality approximations, the number of landmark points is chosen to be greater than the target rank. However, for simplicity the standard Nystrom method uses a sub-optimal procedure for rank reduction. In this paper, we examine the drawbacks of the standard Nystrom method in terms of poor performance and lack of theoretical guarantees. To address these issues, we present an efficient modification for generating improved fixed-rank Nystrom approximat...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning, Computer Science - Machine Learning
Download from
39 references, page 1 of 3

[1] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.

[2] J. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Processing Letters, vol. 9, no. 3, pp. 293-300, 1999. [OpenAIRE]

[3] B. Schölkopf, A. Smola, and K. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1998.

[4] C. Alaíz, M. Fanuel, and J. Suykens, “Convex formulation for kernel PCA and its use in semisupervised learning,” IEEE Transactions on Neural Networks and Learning Systems, 2017.

[5] M. Girolami, “Mercer kernel-based clustering in feature space,” IEEE Transactions on Neural Networks, vol. 13, no. 3, pp. 780-784, 2002.

[6] R. Chitta, R. Jin, T. Havens, and A. Jain, “Approximate kernel k-means: Solution to large scale kernel clustering,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 895-903. [OpenAIRE]

[7] F. Pourkamali-Anaraki and S. Becker, “A randomized approach to efficient kernel clustering,” in IEEE Global Conference on Signal and Information Processing, 2016, pp. 207-211. [OpenAIRE]

[8] R. Langone and J. Suykens, “Fast kernel spectral clustering,” Neurocomputing, 2017. [OpenAIRE]

[9] C. Saunders, A. Gammerman, and V. Vovk, “Ridge regression learning algorithm in dual variables,” in International Conference on Machine Learning, 1998, pp. 515-521. [OpenAIRE]

[10] A. Alaoui and M. Mahoney, “Fast randomized kernel ridge regression with statistical guarantees,” in Advances in Neural Information Processing Systems (NIPS), 2015, pp. 775-783.

[11] B. Schölkopf and A. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2001.

[12] F. Bach, “Sharp analysis of low-rank kernel matrix approximations,” in Conference on Learning Theory, 2013, pp. 185-209.

[13] R. Langone, V. Jumutc, and J. Suykens, “Large-scale clustering algorithms,” in Data Science and Big Data: An Environment of Computational Intelligence. Springer, 2017, pp. 3-28.

[14] C. Cortes, M. Mohri, and A. Talwalkar, “On the impact of kernel approximation on learning accuracy,” in International Conference on Artificial Intelligence and Statistics, 2010, pp. 113-120.

[15] K. Zhang, L. Lan, Z. Wang, and F. Moerchen, “Scaling up kernel SVM on limited resources: A low-rank linearization approach,” in International Conference on Artificial Intelligence and Statistics, 2012, pp. 1425-1434.

39 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue