Bootstrapped Policy Learning: Goal Shaping for Efficient Task-oriented Dialogue Policy Learning

Publication date

2024-05-06

Authors

Zhao, Yangyang
Dastani, MehdiISNI 0000000043464658
Wang, ShihanISNI 0000000492960219

Editors

Advisors

Supervisors

DOI

Document Type

/dk/atira/pure/researchoutput/researchoutputtypes/contributiontojournal/conferencearticle
Open Access logo

License

cc_by

Abstract

Reinforcement Learning (RL) shows promise in optimizing task-oriented dialogue policies, but addressing the challenge of reward sparsity remains challenging. Curriculum learning offers an effective solution by strategically training dialogue policies from simple to complex, facilitating a smooth knowledge transition across varied goal complexities. However, these methods typically assume that goal difficulty will increase gradually to adapt to difficult goals over time. In complex environments lacking intermediate goals, attaining smooth knowledge transitions becomes tricky. This paper proposes a novel Bootstrapped Policy Learning (BPL) framework that adaptively tailors a curriculum for each complex goal through goal shaping, which consists of progressively challenging subgoals. Goal shaping comprises goal decomposition and evolution, breaking complex goals into solvable subgoals and progressively increasing subgoal difficulty as the policy improves. BPL harmoniously combines these aspects, enabling smooth knowledge transitions from simple to complex goals, thereby enhancing task-oriented dialogue policy learning efficiency. Our experiments demonstrate the effectiveness of BPL in two complex dialogue environments.

Keywords

Curriculum Learning, Dialogue Policy, Goal Shaping, Reinforcement Learning, Artificial Intelligence, Software, Control and Systems Engineering

Citation

Zhao, Y, Dastani, M & Wang, S 2024, 'Bootstrapped Policy Learning : Goal Shaping for Efficient Task-oriented Dialogue Policy Learning', Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 2024, no. May, pp. 2615-2617. < https://dl.acm.org/doi/10.5555/3635637.3663245 >