Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy

Publication date

2021-11-10

Authors

Zhao, Yangyang
Wang, Zhenyu
Zhu, ChangxiISNI 000000050790013X
Wang, ShihanISNI 0000000492960219

Editors

Moens, Marie-Francine
Huang, Xuanjing
Specia, Lucia
Yih, Scott Wen-tau

Advisors

Supervisors

Document Type

Part of book
Open Access logo

License

cc_by

Abstract

Deep reinforcement learning has shown great potential in training dialogue policies. However, its favorable performance comes at the cost of many rounds of interaction. Most of the existing dialogue policy methods rely on a single learning system, while the human brain has two specialized learning and memory systems, supporting to find good solutions without requiring copious examples. Inspired by the human brain, this paper proposes a novel complementary policy learning (CPL) framework, which exploits the complementary advantages of the episodic memory (EM) policy and the deep Q-network (DQN) policy to achieve fast and effective dialogue policy learning. In order to coordinate between the two policies, we proposed a confidence controller to control the complementary time according to their relative efficacy at different stages. Furthermore, memory connectivity and time pruning are proposed to guarantee the flexible and adaptive generalization of the EM policy in dialog tasks. Experimental results on three dialogue datasets show that our method significantly outperforms existing methods relying on a single learning system.

Keywords

Citation

Zhao, Y, Wang, Z, Zhu, C & Wang, S 2021, Efficient Dialogue Complementary Policy Learning via Deep Q-network Policy and Episodic Memory Policy. in M-F Moens, X Huang, L Specia & S W Yih (eds), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Dominican Republic, pp. 4311-4323. https://doi.org/10.18653/v1/2021.emnlp-main.354