Subject: Computer Science - Computation and Language | Computer Science - Artificial Intelligence | Computer Science - Learning
This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discrimi... View more
 Steve Young, Milica Gasˇic´, Blaise Thomson, and Jason D Williams, “POMDP-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160- 1179, 2013.
 Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, and Kaheer Suleman, “Policy networks with two-stage training for dialogue systems,” arXiv preprint arXiv:1606.03152, 2016.
 Tiancheng Zhao and Maxine Eskenazi, “Towards end-to-end learning for dialog state tracking and management using deep reinforcement learning,” in Proceedings of SIGDIAL, 2016.
 Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina RojasBarahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, and Steve Young, “Continuously learning neural dialogue management,” arXiv preprint arXiv:1606.02689, 2016.
 Xuijun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz, “End-to-end task-completion neural dialogue systems,” in Proceedings of IJCNLP, 2017.
 Jason D Williams, Kavosh Asadi, and Geoffrey Zweig, “Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning,” in Proceedings of ACL, 2017.
 Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, YunNung Chen, Faisal Ahmed, and Li Deng, “Towards end-toend reinforcement learning of dialogue agents for information access,” in Proceedings of ACL, 2017, pp. 484-495.
 Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, and Kam-Fai Wong, “Composite taskcompletion dialogue policy learning via hierarchical deep reinforcement learning,” in EMNLP, 2017, pp. 2221-2230.
 Bing Liu and Ian Lane, “Iterative policy learning in end-to-end trainable task-oriented neural dialog models,” arXiv preprint arXiv:1709.06136, 2017.
 Zachary C Lipton, Jianfeng Gao, Lihong Li, Xiujun Li, Faisal Ahmed, and Li Deng, “Efficient exploration for dialogue policy learning with BBQ networks & replay buffer spiking,” arXiv preprint arXiv:1608.05081, 2016.