publication . Preprint . 2020

Adversarial Mutual Information for Text Generation

Pan, Boyuan; Yang, Yazheng; Liang, Kaizhao; Kailkhura, Bhavya; Jin, Zhongming; Hua, Xian-Sheng; Cai, Deng; Li, Bo;
Open Access English
  • Published: 30 Jun 2020
Abstract
Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.
Comment: Published at ICML 2020
Subjects
free text keywords: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Download from
52 references, page 1 of 4

Akoury, N., Krishna, K., and Iyyer, M. Syntactically supervised transformers for faster neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1269-1281, 2019. [OpenAIRE]

Arjovsky, M. and Bottou, L. Towards principled methods for training generative adversarial networks. In 5th International Conference on Learning Representations, 2017. [OpenAIRE]

Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214-223, 2017.

Artetxe, M., Labaka, G., Agirre, E., and Cho, K. Unsupervised neural machine translation. In International Conference on Learning Representations, 2018.

Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

Bahl, L., Brown, P., De Souza, P., and Mercer, R. Maximum mutual information estimation of hidden markov model parameters for speech recognition. In ICASSP'86. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 11, pp. 49-52. IEEE, 1986.

Barber, D. and Agakov, F. V. The im algorithm: a variational approach to information maximization. In Advances in neural information processing systems, pp. None, 2003.

Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., and Bengio, S. Generating sentences from a continuous space. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 10-21, 2016.

Che, T., Li, Y., Zhang, R., Hjelm, R. D., Li, W., Song, Y., and Bengio, Y. Maximum-likelihood augmented discrete generative adversarial networks. arXiv preprint arXiv:1702.07983, 2017.

Chen, L., Dai, S., Tao, C., Zhang, H., Gan, Z., Shen, D., Zhang, Y., Wang, G., Zhang, R., and Carin, L. Adversarial text generation via feature-mover's distance. In Advances in Neural Information Processing Systems, pp. 4666-4677, 2018.

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pp. 2172-2180, 2016.

Cho, K., van Merrie¨nboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724-1734, 2014.

Foster, A., Jankowiak, M., Bingham, E., Teh, Y. W., Rainforth, T., and Goodman, N. Variational optimal experiment design: Efficient automation of adaptive experiments. NeurIPS Bayesian Deep Learning Workshop, 2018.

Gabrie´, M., Manoel, A., Luneau, C., Macris, N., Krzakala, F., Zdeborova´, L., et al. Entropy and mutual information in models of deep neural networks. In Advances in Neural Information Processing Systems, pp. 1821-1831, 2018.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672-2680, 2014.

52 references, page 1 of 4
Abstract
Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.
Comment: Published at ICML 2020
Subjects
free text keywords: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Download from
52 references, page 1 of 4

Akoury, N., Krishna, K., and Iyyer, M. Syntactically supervised transformers for faster neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1269-1281, 2019. [OpenAIRE]

Arjovsky, M. and Bottou, L. Towards principled methods for training generative adversarial networks. In 5th International Conference on Learning Representations, 2017. [OpenAIRE]

Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214-223, 2017.

Artetxe, M., Labaka, G., Agirre, E., and Cho, K. Unsupervised neural machine translation. In International Conference on Learning Representations, 2018.

Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

Bahl, L., Brown, P., De Souza, P., and Mercer, R. Maximum mutual information estimation of hidden markov model parameters for speech recognition. In ICASSP'86. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 11, pp. 49-52. IEEE, 1986.

Barber, D. and Agakov, F. V. The im algorithm: a variational approach to information maximization. In Advances in neural information processing systems, pp. None, 2003.

Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., and Bengio, S. Generating sentences from a continuous space. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 10-21, 2016.

Che, T., Li, Y., Zhang, R., Hjelm, R. D., Li, W., Song, Y., and Bengio, Y. Maximum-likelihood augmented discrete generative adversarial networks. arXiv preprint arXiv:1702.07983, 2017.

Chen, L., Dai, S., Tao, C., Zhang, H., Gan, Z., Shen, D., Zhang, Y., Wang, G., Zhang, R., and Carin, L. Adversarial text generation via feature-mover's distance. In Advances in Neural Information Processing Systems, pp. 4666-4677, 2018.

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pp. 2172-2180, 2016.

Cho, K., van Merrie¨nboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724-1734, 2014.

Foster, A., Jankowiak, M., Bingham, E., Teh, Y. W., Rainforth, T., and Goodman, N. Variational optimal experiment design: Efficient automation of adaptive experiments. NeurIPS Bayesian Deep Learning Workshop, 2018.

Gabrie´, M., Manoel, A., Luneau, C., Macris, N., Krzakala, F., Zdeborova´, L., et al. Entropy and mutual information in models of deep neural networks. In Advances in Neural Information Processing Systems, pp. 1821-1831, 2018.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672-2680, 2014.

52 references, page 1 of 4
Any information missing or wrong?Report an Issue