Steve Young, Milica Gasˇic´, Blaise Thomson, and Jason D Williams, “POMDP-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160- 1179, 2013.
 Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, and Kaheer Suleman, “Policy networks with two-stage training for dialogue systems,” arXiv preprint arXiv:1606.03152, 2016.
 Tiancheng Zhao and Maxine Eskenazi, “Towards end-to-end learning for dialog state tracking and management using deep reinforcement learning,” in Proceedings of SIGDIAL, 2016. [OpenAIRE]
 Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina RojasBarahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, and Steve Young, “Continuously learning neural dialogue management,” arXiv preprint arXiv:1606.02689, 2016. [OpenAIRE]
 Xuijun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz, “End-to-end task-completion neural dialogue systems,” in Proceedings of IJCNLP, 2017. [OpenAIRE]
 Jason D Williams, Kavosh Asadi, and Geoffrey Zweig, “Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning,” in Proceedings of ACL, 2017. [OpenAIRE]
 Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, YunNung Chen, Faisal Ahmed, and Li Deng, “Towards end-toend reinforcement learning of dialogue agents for information access,” in Proceedings of ACL, 2017, pp. 484-495.
 Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, and Kam-Fai Wong, “Composite taskcompletion dialogue policy learning via hierarchical deep reinforcement learning,” in EMNLP, 2017, pp. 2221-2230. [OpenAIRE]
 Bing Liu and Ian Lane, “Iterative policy learning in end-to-end trainable task-oriented neural dialog models,” arXiv preprint arXiv:1709.06136, 2017.
 Zachary C Lipton, Jianfeng Gao, Lihong Li, Xiujun Li, Faisal Ahmed, and Li Deng, “Efficient exploration for dialogue policy learning with BBQ networks & replay buffer spiking,” arXiv preprint arXiv:1608.05081, 2016.
 Nuttapong Chentanez, Andrew G Barto, and Satinder P Singh, “Intrinsically motivated reinforcement learning,” in NIPS, 2005, pp. 1281-1288.
 Shakir Mohamed and Danilo Jimenez Rezende, “Variational information maximisation for intrinsically motivated reinforcement learning,” in NIPS, 2015, pp. 2125-2133.
 Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel, “Vime: Variational information maximizing exploration,” in NIPS, 2016, pp. 1109-1117.
 Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu, “Reinforcement learning with unsupervised auxiliary tasks,” arXiv preprint arXiv:1611.05397, 2016. [OpenAIRE]
 Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in NIPS, 2014, pp. 2672-2680.