End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Preprint English OPEN
Zhou, Li; Small, Kevin; Rokhlenko, Oleg; Elkan, Charles;
  • Subject: Computer Science - Computation and Language | Computer Science - Artificial Intelligence | Computer Science - Learning

Learning a goal-oriented dialog policy is generally performed offline with supervised learning algorithms or online with reinforcement learning (RL). Additionally, as companies accumulate massive quantities of dialog transcripts between customers and trained human agent... View more
  • References (30)
    30 references, page 1 of 3

    [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2015.

    [2] Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. An actor-critic algorithm for sequence prediction. In International Conference on Learning Representations, 2017.

    [3] Antoine Bordes and Jason Weston. Learning end-to-end goal-oriented dialog. In International Conference on Learning Representations, 2017.

    [4] Thomas Degris, Martha White, and Richard S Sutton. Off-policy actor-critic. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pages 179-186, 2012.

    [5] Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, and Li Deng. Towards end-to-end reinforcement learning of dialogue agents for information access. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 484-495, 2017.

    [6] Mihail Eric and Christopher D Manning. A copy-augmented sequence-to-sequence architecture gives good performance on task-oriented dialogue. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, page 468, 2017.

    [7] Edward L Ionides. Truncated importance sampling. Journal of Computational and Graphical Statistics, 17(2):295-311, 2008.

    [8] Kirthevasan Kandasamy, Yoram Bachrach, Ryota Tomioka, Daniel Tarlow, and David Carter. Batch policy gradient methods for improving neural conversation models. In International Conference on Learning Representations, 2017.

    [9] Diederik Kingma and Jimmy Ba. arXiv:1412.6980, 2014.

    [10] Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1192-1202, 2016.

  • Related Research Results (1)
  • Related Organizations (1)
  • Metrics
Share - Bookmark