publication . Preprint . Conference object . 2017

Sparse Coding and Autoencoders

Akshay Rangamani; Anirbit Mukherjee; Amitabh Basu; Ashish Arora; Tejaswini Ganapathi; Sang Chin; Trac D. Tran;
Open Access English
  • Published: 11 Aug 2017
Abstract
In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in \mathbb{R}^h$ with a small support of size $h^p$ for some $0 <p < 1$ while having access to observations $y \in \mathbb{R}^n$ where $y = A^*x^*$. In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The "Autoencoder" architecture we consider is a $\mathbb{R}^n \rightarrow \mathbb{R}^n$ mapping with a single ReLU activation layer of size $h$. Under very mild distribu...
Subjects
free text keywords: Computer Science - Learning, Mathematics - Optimization and Control, Statistics - Machine Learning, Matrix (mathematics), Neural coding, Discrete mathematics, Gradient descent, Code dimension, Combinatorics, Square (algebra), Computer science
Funded by
NSF| CAREER: Foundational Aspects of Discrete Optimization: Theory and Algorithms
Project
  • Funder: National Science Foundation (NSF)
  • Project Code: 1452820
  • Funding stream: Directorate for Engineering | Division of Civil, Mechanical & Manufacturing Innovation
23 references, page 1 of 2

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.

[2] A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, and R. Tandon. Learning sparsely used overcomplete dictionaries. In COLT, pages 123-137, 2014.

[3] G. Alain and Y. Bengio. What regularized auto-encoders learn from the data-generating distribution. Journal of Machine Learning Research, 15(1):3563-3593, 2014.

[4] A. Anandkumar, R. Ge, D. J. Hsu, S. M. Kakade, and M. Telgarsky. Tensor decompositions for learning latent variable models. Journal of Machine Learning Research, 15(1):2773-2832, 2014.

[5] R. Arora, A. Basu, P. Mianjy, and A. Mukherjee. Understanding deep neural networks with rectified linear units. arXiv preprint arXiv:1611.01491, 2016.

[6] S. Arora, A. Bhaskara, R. Ge, and T. Ma. More algorithms for provable dictionary learning. arXiv:1401.0579, 2014.

[7] S. Arora, R. Ge, T. Ma, and A. Moitra. Simple, efficient, and neural algorithms for sparse coding. In COLT, pages 113-149, 2015.

[8] S. Arora, R. Ge, and A. Moitra. New algorithms for learning incoherent and overcomplete dictionaries. In COLT, pages 779-806, 2014. [OpenAIRE]

[9] D. Arpit, Y. Zhou, H. Ngo, and V. Govindaraju. Why regularized auto-encoders learn sparse representation? In International Conference on Machine Learning, pages 136-144, 2016.

[10] J. Błasiok and J. Nelson. An improved analysis of the er-spud dictionary learning algorithm. arXiv:1602.05719, 2016. [OpenAIRE]

[11] A. Coates, A. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215-223, 2011.

[12] A. Coates and A. Y. Ng. The importance of encoding versus training with sparse coding and vector quantization. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 921-928, 2011.

[13] R. Ge, C. Jin, and Y. Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. arXiv preprint arXiv:1704.00708, 2017.

[14] M. Janzamin, H. Sedghi, and A. Anandkumar. Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473, 2015. [OpenAIRE]

[15] Q. V. Le. Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 8595-8598. IEEE, 2013.

23 references, page 1 of 2
Abstract
In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in \mathbb{R}^h$ with a small support of size $h^p$ for some $0 <p < 1$ while having access to observations $y \in \mathbb{R}^n$ where $y = A^*x^*$. In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The "Autoencoder" architecture we consider is a $\mathbb{R}^n \rightarrow \mathbb{R}^n$ mapping with a single ReLU activation layer of size $h$. Under very mild distribu...
Subjects
free text keywords: Computer Science - Learning, Mathematics - Optimization and Control, Statistics - Machine Learning, Matrix (mathematics), Neural coding, Discrete mathematics, Gradient descent, Code dimension, Combinatorics, Square (algebra), Computer science
Funded by
NSF| CAREER: Foundational Aspects of Discrete Optimization: Theory and Algorithms
Project
  • Funder: National Science Foundation (NSF)
  • Project Code: 1452820
  • Funding stream: Directorate for Engineering | Division of Civil, Mechanical & Manufacturing Innovation
23 references, page 1 of 2

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.

[2] A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, and R. Tandon. Learning sparsely used overcomplete dictionaries. In COLT, pages 123-137, 2014.

[3] G. Alain and Y. Bengio. What regularized auto-encoders learn from the data-generating distribution. Journal of Machine Learning Research, 15(1):3563-3593, 2014.

[4] A. Anandkumar, R. Ge, D. J. Hsu, S. M. Kakade, and M. Telgarsky. Tensor decompositions for learning latent variable models. Journal of Machine Learning Research, 15(1):2773-2832, 2014.

[5] R. Arora, A. Basu, P. Mianjy, and A. Mukherjee. Understanding deep neural networks with rectified linear units. arXiv preprint arXiv:1611.01491, 2016.

[6] S. Arora, A. Bhaskara, R. Ge, and T. Ma. More algorithms for provable dictionary learning. arXiv:1401.0579, 2014.

[7] S. Arora, R. Ge, T. Ma, and A. Moitra. Simple, efficient, and neural algorithms for sparse coding. In COLT, pages 113-149, 2015.

[8] S. Arora, R. Ge, and A. Moitra. New algorithms for learning incoherent and overcomplete dictionaries. In COLT, pages 779-806, 2014. [OpenAIRE]

[9] D. Arpit, Y. Zhou, H. Ngo, and V. Govindaraju. Why regularized auto-encoders learn sparse representation? In International Conference on Machine Learning, pages 136-144, 2016.

[10] J. Błasiok and J. Nelson. An improved analysis of the er-spud dictionary learning algorithm. arXiv:1602.05719, 2016. [OpenAIRE]

[11] A. Coates, A. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215-223, 2011.

[12] A. Coates and A. Y. Ng. The importance of encoding versus training with sparse coding and vector quantization. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 921-928, 2011.

[13] R. Ge, C. Jin, and Y. Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. arXiv preprint arXiv:1704.00708, 2017.

[14] M. Janzamin, H. Sedghi, and A. Anandkumar. Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473, 2015. [OpenAIRE]

[15] Q. V. Le. Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 8595-8598. IEEE, 2013.

23 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue