publication . Preprint . 2019

On Variational Bounds of Mutual Information

Poole, Ben; Ozair, Sherjil; Oord, Aaron van den; Alemi, Alexander A.; Tucker, George;
Open Access English
  • Published: 16 May 2019
Abstract
Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging. To establish tractable and scalable objectives, recent work has turned to variational bounds parameterized by neural networks, but the relationships and tradeoffs between these bounds remains unclear. In this work, we unify these recent developments in a single framework. We find that the existing variational lower bounds degrade when the MI is large, exhibiting either high bias or high variance. To address this problem, we introduce a continuum of lower bounds that encompasses previous bounds and flexibly trades...
Subjects
free text keywords: Computer Science - Machine Learning, Statistics - Machine Learning
Download from
51 references, page 1 of 4

Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410, 2016.

Alemi, A. A., Poole, B., Fischer, I., Dillon, J. V., Saurous, R. A., and Murphy, K. Fixing a broken elbo, 2017.

Barber, D. and Agakov, F. The im algorithm: A variational approach to information maximization. In NIPS, pp. 201- 208. MIT Press, 2003.

Belghazi, M. I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Hjelm, D., and Courville, A. Mutual information neural estimation. In International Conference on Machine Learning, pp. 530-539, 2018. [OpenAIRE]

Bell, A. J. and Sejnowski, T. J. An informationmaximization approach to blind separation and blind deconvolution. Neural computation, 7(6):1129-1159, 1995.

Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859-877, 2017. [OpenAIRE]

Burgess, C. P., Higgins, I., Pal, A., Matthey, L., Watters, N., Desjardins, G., and Lerchner, A. Understanding disentangling in -vae. arXiv preprint arXiv:1804.03599, 2018.

Chen, T. Q., Li, X., Grosse, R., and Duvenaud, D. Isolating sources of disentanglement in variational autoencoders. arXiv preprint arXiv:1802.04942, 2018.

Doersch, C., Gupta, A., and Efros, A. A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1422-1430, 2015. [OpenAIRE]

Donsker, M. D. and Varadhan, S. S. Asymptotic evaluation of certain markov process expectations for large time. iv. Communications on Pure and Applied Mathematics, 36 (2):183-212, 1983. [OpenAIRE]

Dosovitskiy, A., Springenberg, J. T., Riedmiller, M., and Brox, T. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 766-774, 2014. [OpenAIRE]

Foster, A., Jankowiak, M., Bingham, E., Teh, Y. W., Rainforth, T., and Goodman, N. Variational optimal experiment design: Efficient automation of adaptive experiments. In NeurIPS Bayesian Deep Learning Workshop, 2018.

Gabrie´, M., Manoel, A., Luneau, C., Barbier, J., Macris, N., Krzakala, F., and Zdeborova´, L. Entropy and mutual information in models of deep neural networks. arXiv preprint arXiv:1805.09785, 2018. [OpenAIRE]

Gao, S., Ver Steeg, G., and Galstyan, A. Efficient estimation of mutual information for strongly dependent variables. In Artificial Intelligence and Statistics, pp. 277-286, 2015.

Higgins, I., Matthey, L., Glorot, X., Pal, A., Uria, B., Blundell, C., Mohamed, S., and Lerchner, A. Early visual concept learning with unsupervised deep learning. arXiv preprint arXiv:1606.05579, 2016. [OpenAIRE]

51 references, page 1 of 4
Abstract
Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging. To establish tractable and scalable objectives, recent work has turned to variational bounds parameterized by neural networks, but the relationships and tradeoffs between these bounds remains unclear. In this work, we unify these recent developments in a single framework. We find that the existing variational lower bounds degrade when the MI is large, exhibiting either high bias or high variance. To address this problem, we introduce a continuum of lower bounds that encompasses previous bounds and flexibly trades...
Subjects
free text keywords: Computer Science - Machine Learning, Statistics - Machine Learning
Download from
51 references, page 1 of 4

Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410, 2016.

Alemi, A. A., Poole, B., Fischer, I., Dillon, J. V., Saurous, R. A., and Murphy, K. Fixing a broken elbo, 2017.

Barber, D. and Agakov, F. The im algorithm: A variational approach to information maximization. In NIPS, pp. 201- 208. MIT Press, 2003.

Belghazi, M. I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Hjelm, D., and Courville, A. Mutual information neural estimation. In International Conference on Machine Learning, pp. 530-539, 2018. [OpenAIRE]

Bell, A. J. and Sejnowski, T. J. An informationmaximization approach to blind separation and blind deconvolution. Neural computation, 7(6):1129-1159, 1995.

Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859-877, 2017. [OpenAIRE]

Burgess, C. P., Higgins, I., Pal, A., Matthey, L., Watters, N., Desjardins, G., and Lerchner, A. Understanding disentangling in -vae. arXiv preprint arXiv:1804.03599, 2018.

Chen, T. Q., Li, X., Grosse, R., and Duvenaud, D. Isolating sources of disentanglement in variational autoencoders. arXiv preprint arXiv:1802.04942, 2018.

Doersch, C., Gupta, A., and Efros, A. A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1422-1430, 2015. [OpenAIRE]

Donsker, M. D. and Varadhan, S. S. Asymptotic evaluation of certain markov process expectations for large time. iv. Communications on Pure and Applied Mathematics, 36 (2):183-212, 1983. [OpenAIRE]

Dosovitskiy, A., Springenberg, J. T., Riedmiller, M., and Brox, T. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 766-774, 2014. [OpenAIRE]

Foster, A., Jankowiak, M., Bingham, E., Teh, Y. W., Rainforth, T., and Goodman, N. Variational optimal experiment design: Efficient automation of adaptive experiments. In NeurIPS Bayesian Deep Learning Workshop, 2018.

Gabrie´, M., Manoel, A., Luneau, C., Barbier, J., Macris, N., Krzakala, F., and Zdeborova´, L. Entropy and mutual information in models of deep neural networks. arXiv preprint arXiv:1805.09785, 2018. [OpenAIRE]

Gao, S., Ver Steeg, G., and Galstyan, A. Efficient estimation of mutual information for strongly dependent variables. In Artificial Intelligence and Statistics, pp. 277-286, 2015.

Higgins, I., Matthey, L., Glorot, X., Pal, A., Uria, B., Blundell, C., Mohamed, S., and Lerchner, A. Early visual concept learning with unsupervised deep learning. arXiv preprint arXiv:1606.05579, 2016. [OpenAIRE]

51 references, page 1 of 4
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue