Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Preprint English OPEN
Peng, Baolin; Li, Xiujun; Gao, Jianfeng; Liu, Jingjing; Chen, Yun-Nung; Wong, Kam-Fai;
  • Subject: Computer Science - Computation and Language | Computer Science - Artificial Intelligence | Computer Science - Learning

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discrimi... View more
  • References (25)
    25 references, page 1 of 3

    [1] Steve Young, Milica Gasˇic´, Blaise Thomson, and Jason D Williams, “POMDP-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160- 1179, 2013.

    [2] Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, and Kaheer Suleman, “Policy networks with two-stage training for dialogue systems,” arXiv preprint arXiv:1606.03152, 2016.

    [3] Tiancheng Zhao and Maxine Eskenazi, “Towards end-to-end learning for dialog state tracking and management using deep reinforcement learning,” in Proceedings of SIGDIAL, 2016.

    [4] Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina RojasBarahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, and Steve Young, “Continuously learning neural dialogue management,” arXiv preprint arXiv:1606.02689, 2016.

    [5] Xuijun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz, “End-to-end task-completion neural dialogue systems,” in Proceedings of IJCNLP, 2017.

    [6] Jason D Williams, Kavosh Asadi, and Geoffrey Zweig, “Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning,” in Proceedings of ACL, 2017.

    [7] Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, YunNung Chen, Faisal Ahmed, and Li Deng, “Towards end-toend reinforcement learning of dialogue agents for information access,” in Proceedings of ACL, 2017, pp. 484-495.

    [8] Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, and Kam-Fai Wong, “Composite taskcompletion dialogue policy learning via hierarchical deep reinforcement learning,” in EMNLP, 2017, pp. 2221-2230.

    [9] Bing Liu and Ian Lane, “Iterative policy learning in end-to-end trainable task-oriented neural dialog models,” arXiv preprint arXiv:1709.06136, 2017.

    [10] Zachary C Lipton, Jianfeng Gao, Lihong Li, Xiujun Li, Faisal Ahmed, and Li Deng, “Efficient exploration for dialogue policy learning with BBQ networks & replay buffer spiking,” arXiv preprint arXiv:1608.05081, 2016.

  • Metrics
Share - Bookmark