publication . Preprint . 2012

Joint Training of Deep Boltzmann Machines

Goodfellow, Ian; Courville, Aaron; Bengio, Yoshua;
Open Access English
  • Published: 11 Dec 2012
We introduce a new method for training deep Boltzmann machines jointly. Prior methods require an initial learning pass that trains the deep Boltzmann machine greedily, one layer at a time, or do not perform well on classifi- cation tasks.
ACM Computing Classification System: ComputingMethodologies_PATTERNRECOGNITION
free text keywords: Statistics - Machine Learning, Computer Science - Learning
Download from

Arnold, L. and Ollivier, Y. (2012). Layer-wise learning of deep generative models. ArXiv e-prints.

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580.

Montavon, G. and Mu¨ller, K.-R. (2012). Learning feature hierarchies with cented deep Boltzmann machines. CoRR, abs/1203.4416.

Salakhutdinov, R. and Hinton, G. (2009). Deep Boltzmann machines. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), volume 8.

Stoyanov, V., Ropson, A., and Eisner, J. (2011). Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 15 of JMLR Workshop and Conference Proceedings, pages 725-733, Fort Lauderdale. Supplementary material (4 pages) also available.

Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In W. W. Cohen, A. McCallum, and S. T. Roweis, editors, ICML 2008 , pages 1064-1071. ACM. [OpenAIRE]

Younes, L. (1999). On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports, 65(3), 177-228.

Any information missing or wrong?Report an Issue