publication . Preprint . 2017

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Peng, Baolin; Li, Xiujun; Gao, Jianfeng; Liu, Jingjing; Chen, Yun-Nung; Wong, Kam-Fai;
Open Access English
  • Published: 30 Oct 2017
Abstract
This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking doma...
Subjects
free text keywords: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Learning
Download from
25 references, page 1 of 2

[1] Steve Young, Milica Gasˇic´, Blaise Thomson, and Jason D Williams, “POMDP-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160- 1179, 2013.

[2] Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, and Kaheer Suleman, “Policy networks with two-stage training for dialogue systems,” arXiv preprint arXiv:1606.03152, 2016.

[3] Tiancheng Zhao and Maxine Eskenazi, “Towards end-to-end learning for dialog state tracking and management using deep reinforcement learning,” in Proceedings of SIGDIAL, 2016. [OpenAIRE]

[4] Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina RojasBarahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, and Steve Young, “Continuously learning neural dialogue management,” arXiv preprint arXiv:1606.02689, 2016. [OpenAIRE]

[5] Xuijun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz, “End-to-end task-completion neural dialogue systems,” in Proceedings of IJCNLP, 2017. [OpenAIRE]

[6] Jason D Williams, Kavosh Asadi, and Geoffrey Zweig, “Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning,” in Proceedings of ACL, 2017. [OpenAIRE]

[7] Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, YunNung Chen, Faisal Ahmed, and Li Deng, “Towards end-toend reinforcement learning of dialogue agents for information access,” in Proceedings of ACL, 2017, pp. 484-495. [OpenAIRE]

[8] Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, and Kam-Fai Wong, “Composite taskcompletion dialogue policy learning via hierarchical deep reinforcement learning,” in EMNLP, 2017, pp. 2221-2230. [OpenAIRE]

[9] Bing Liu and Ian Lane, “Iterative policy learning in end-to-end trainable task-oriented neural dialog models,” arXiv preprint arXiv:1709.06136, 2017.

[10] Zachary C Lipton, Jianfeng Gao, Lihong Li, Xiujun Li, Faisal Ahmed, and Li Deng, “Efficient exploration for dialogue policy learning with BBQ networks & replay buffer spiking,” arXiv preprint arXiv:1608.05081, 2016.

[11] Nuttapong Chentanez, Andrew G Barto, and Satinder P Singh, “Intrinsically motivated reinforcement learning,” in NIPS, 2005, pp. 1281-1288. [OpenAIRE]

[12] Shakir Mohamed and Danilo Jimenez Rezende, “Variational information maximisation for intrinsically motivated reinforcement learning,” in NIPS, 2015, pp. 2125-2133.

[13] Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel, “Vime: Variational information maximizing exploration,” in NIPS, 2016, pp. 1109-1117.

[14] Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu, “Reinforcement learning with unsupervised auxiliary tasks,” arXiv preprint arXiv:1611.05397, 2016. [OpenAIRE]

[15] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in NIPS, 2014, pp. 2672-2680.

25 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue