publication . Doctoral thesis . 2017

Variational inference & deep learning: A new synthesis

Kingma, D.P.;
Open Access English
  • Published: 01 Jan 2017
Abstract
In this thesis, Variational Inference and Deep Learning: A New Synthesis, we propose novel solutions to the problems of variational (Bayesian) inference, generative modeling, representation learning, semi-supervised learning, and stochastic optimization.
Subjects
arxiv: Computer Science::Machine Learning
Download from
77 references, page 1 of 6

1 introduction and background 1 1.1 Artificial Intelligence 1 1.2 Probabilistic Models and Variational Inference 2 1.2.1 Conditional Models 3 1.3 Parameterizing Conditional Distributions with Neural Networks 1.4 Directed Graphical Models and Neural Networks 5 1.5 Learning in Fully Observed Models with Neural Nets 5 1.5.1 Dataset 5 1.5.2 Maximum Likelihood and Minibatch SGD 6 1.5.3 Bayesian inference 7 1.6 Learning and Inference in Deep Latent Variable Models 7 1.6.1 Latent Variables 7 1.6.2 Deep Latent Variable Models 8 1.6.3 Example DLVM for multivariate Bernoulli data 9 1.7 Intractabilities 9 1.8 Research Questions and Contributions 10

2 variational autoencoders 13 2.1 Introduction 13

2.2 Encoder or Approximate Posterior 13 2.3 Evidence Lower Bound (ELBO) 14 2.3.1 A Double-Edged Sword 17 2.4 Stochastic Gradient-Based Optimization of the ELBO 2.5 Reparameterization Trick 18 2.5.1 Change of variables 20 2.5.2 Gradient of expectation under change of variable 2.5.3 Gradient of ELBO 21 2.5.4 Computation of log qf(zjz) 22 2.6 Factorized Gaussian posteriors 23 2.6.1 Full-covariance Gaussian posterior 23 2.7 Estimation of the Marginal Likelihood 25 2.8 Marginal Likelihood and ELBO as KL Divergences 5 inverse autoregressive flow 73 5.1 Requirements for Computational Tractability 5.2 Improving the Flexibility of Inference Models 5.2.1 Auxiliary Latent Variables 74 5.2.2 Normalizing Flows 75 5.3 Inverse Autoregressive Transformations 5.4 Inverse Autoregressive Flow (IAF) 78 5.5 Related work 82 5.6 Experiments 83 5.6.1 MNIST 83 5.6.2 CIFAR-10 85 5.7 Conclusion 86

Ryan Prescott Adams and Zoubin Ghahramani. Archipelago: nonparametric Bayesian semi-supervised learning. In Proceedings of the International Conference on Machine Learning (ICML), 2009.

Sungjin Ahn, Anoop Korattikara, and Max Welling. Bayesian posterior sampling via stochastic gradient Fisher scoring. arXiv preprint arXiv:1206.6380, 2012. [OpenAIRE]

Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251-276, 1998.

Jimmy Ba and Brendan Frey. Adaptive dropout for training deep neural networks. In Advances in Neural Information Processing Systems, pages 3084- 3092, 2013.

Justin Bayer, Maximilian Karol, Daniela Korhammer, and Patrick Van der Smagt. Fast adaptive weight noise. arXiv preprint arXiv:1507.05331, 2015.

Mikhail Belkin and Partha Adviser-Niyogi. Problems of learning on manifolds. 2003.

Yoshua Bengio. Estimating or propagating gradients through stochastic neurons. arXiv preprint arXiv:1305.2982, 2013. [OpenAIRE]

Yoshua Bengio and Éric Thibodeau-Laufer. Deep generative stochastic networks trainable by backprop. arXiv preprint arXiv:1306.1091, 2013. [OpenAIRE]

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE, 2013.

James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), volume 4, 2010.

David M Blei, Michael I Jordan, and John W Paisley. Variational Bayesian inference with Stochastic Search. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1367-1374, 2012.

Avrim Blum, John Lafferty, Mugizi Robert Rwebangira, and Rajashekar Reddy. Semi-supervised learning using randomized mincuts. In Proceedings of the International Conference on Machine Learning (ICML), 2004.

77 references, page 1 of 6
Abstract
In this thesis, Variational Inference and Deep Learning: A New Synthesis, we propose novel solutions to the problems of variational (Bayesian) inference, generative modeling, representation learning, semi-supervised learning, and stochastic optimization.
Subjects
arxiv: Computer Science::Machine Learning
Download from
77 references, page 1 of 6

1 introduction and background 1 1.1 Artificial Intelligence 1 1.2 Probabilistic Models and Variational Inference 2 1.2.1 Conditional Models 3 1.3 Parameterizing Conditional Distributions with Neural Networks 1.4 Directed Graphical Models and Neural Networks 5 1.5 Learning in Fully Observed Models with Neural Nets 5 1.5.1 Dataset 5 1.5.2 Maximum Likelihood and Minibatch SGD 6 1.5.3 Bayesian inference 7 1.6 Learning and Inference in Deep Latent Variable Models 7 1.6.1 Latent Variables 7 1.6.2 Deep Latent Variable Models 8 1.6.3 Example DLVM for multivariate Bernoulli data 9 1.7 Intractabilities 9 1.8 Research Questions and Contributions 10

2 variational autoencoders 13 2.1 Introduction 13

2.2 Encoder or Approximate Posterior 13 2.3 Evidence Lower Bound (ELBO) 14 2.3.1 A Double-Edged Sword 17 2.4 Stochastic Gradient-Based Optimization of the ELBO 2.5 Reparameterization Trick 18 2.5.1 Change of variables 20 2.5.2 Gradient of expectation under change of variable 2.5.3 Gradient of ELBO 21 2.5.4 Computation of log qf(zjz) 22 2.6 Factorized Gaussian posteriors 23 2.6.1 Full-covariance Gaussian posterior 23 2.7 Estimation of the Marginal Likelihood 25 2.8 Marginal Likelihood and ELBO as KL Divergences 5 inverse autoregressive flow 73 5.1 Requirements for Computational Tractability 5.2 Improving the Flexibility of Inference Models 5.2.1 Auxiliary Latent Variables 74 5.2.2 Normalizing Flows 75 5.3 Inverse Autoregressive Transformations 5.4 Inverse Autoregressive Flow (IAF) 78 5.5 Related work 82 5.6 Experiments 83 5.6.1 MNIST 83 5.6.2 CIFAR-10 85 5.7 Conclusion 86

Ryan Prescott Adams and Zoubin Ghahramani. Archipelago: nonparametric Bayesian semi-supervised learning. In Proceedings of the International Conference on Machine Learning (ICML), 2009.

Sungjin Ahn, Anoop Korattikara, and Max Welling. Bayesian posterior sampling via stochastic gradient Fisher scoring. arXiv preprint arXiv:1206.6380, 2012. [OpenAIRE]

Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251-276, 1998.

Jimmy Ba and Brendan Frey. Adaptive dropout for training deep neural networks. In Advances in Neural Information Processing Systems, pages 3084- 3092, 2013.

Justin Bayer, Maximilian Karol, Daniela Korhammer, and Patrick Van der Smagt. Fast adaptive weight noise. arXiv preprint arXiv:1507.05331, 2015.

Mikhail Belkin and Partha Adviser-Niyogi. Problems of learning on manifolds. 2003.

Yoshua Bengio. Estimating or propagating gradients through stochastic neurons. arXiv preprint arXiv:1305.2982, 2013. [OpenAIRE]

Yoshua Bengio and Éric Thibodeau-Laufer. Deep generative stochastic networks trainable by backprop. arXiv preprint arXiv:1306.1091, 2013. [OpenAIRE]

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE, 2013.

James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), volume 4, 2010.

David M Blei, Michael I Jordan, and John W Paisley. Variational Bayesian inference with Stochastic Search. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1367-1374, 2012.

Avrim Blum, John Lafferty, Mugizi Robert Rwebangira, and Rajashekar Reddy. Semi-supervised learning using randomized mincuts. In Proceedings of the International Conference on Machine Learning (ICML), 2004.

77 references, page 1 of 6
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue