Goal-oriented Dialogue Policy Learning from Failures
Lu, Keting; Zhang, Shiqi; Chen, Xiaoping;
Subject: Computer Science - Computation and Language | Computer Science - Artificial Intelligence
Reinforcement learning methods have been used for learning dialogue policies. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the very few successful dial... View more
[Andrychowicz et al. 2017] Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Abbeel, O. P.; and Zaremba, W. 2017. Hindsight experience replay. In Advances in Neural Information Processing Systems, 5048-5058.
[Asadi and Williams 2016] Asadi, K., and Williams, J. D. 2016.
arXiv preprint arXiv:1612.06000.
[Bernsen, Dybkjaer, and Dybkjaer 1996] Bernsen, N. O.; Dybkjaer, H.; and Dybkjaer, L. 1996. Principles for the design of cooperative spoken human-machine dialogue. In Spoken Language, 1996.
ICSLP 96. Proceedings., Fourth International Conference on, volume 2, 729-732. IEEE.
[Boularias, Chinaei, and Chaib-draa 2010] Boularias, A.; Chinaei, H. R.; and Chaib-draa, B. 2010. Learning the reward model of dialogue pomdps from data. In NIPS Workshop on Machine Learning for Assistive Techniques.
[Chandramohan, Geist, and Pietquin 2010] Chandramohan, S.; Geist, M.; and Pietquin, O. 2010. Optimizing spoken dialogue management with fitted value iteration. In Eleventh Annual Conference of the International Speech Communication Association.
[Cuaya´huitl 2017] Cuaya´huitl, H. 2017. Simpleds: A simple deep reinforcement learning dialogue system. In Dialogues with Social Robots. Springer. 109-118.
[El Asri, Laroche, and Pietquin 2012] El Asri, L.; Laroche, R.; and Pietquin, O. 2012. Reward function learning for dialogue management. In STAIRS, 95-106.
[Fatemi et al. 2016] Fatemi, M.; El Asri, L.; Schulz, H.; He, J.; and Suleman, K. 2016. Policy networks with two-stage training for dialogue systems. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 101-110.