Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy

Zhao, Yangyang; Wang, Zhenyu; Zhu, Changxi; Wang, Shihan

doi:https://doi.org/10.18653/v1/2021.emnlp-main.354

Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy

Files

2021.emnlp_main.354.pdf (3.47 MB)

Publication date

2021-11-10

Authors

Zhao, Yangyang

Wang, Zhenyu

Zhu, Changxi

Wang, Shihan

Editors

Moens, Marie-Francine

Huang, Xuanjing

Specia, Lucia

Yih, Scott Wen-tau

DOI

https://doi.org/10.18653/v1/2021.emnlp-main.354

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

Deep reinforcement learning has shown great potential in training dialogue policies. However, its favorable performance comes at the cost of many rounds of interaction. Most of the existing dialogue policy methods rely on a single learning system, while the human brain has two specialized learning and memory systems, supporting to find good solutions without requiring copious examples. Inspired by the human brain, this paper proposes a novel complementary policy learning (CPL) framework, which exploits the complementary advantages of the episodic memory (EM) policy and the deep Q-network (DQN) policy to achieve fast and effective dialogue policy learning. In order to coordinate between the two policies, we proposed a confidence controller to control the complementary time according to their relative efficacy at different stages. Furthermore, memory connectivity and time pruning are proposed to guarantee the flexible and adaptive generalization of the EM policy in dialog tasks. Experimental results on three dialogue datasets show that our method significantly outperforms existing methods relying on a single learning system.

Citation

Zhao, Y, Wang, Z, Zhu, C & Wang, S 2021, Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy. in M-F Moens, X Huang, L Specia & S W Yih (eds), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Dominican Republic, pp. 4311-4323. https://doi.org/10.18653/v1/2021.emnlp-main.354

URI

https://dspace.library.uu.nl/handle/1874/415026

Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI