Bootstrapped Policy Learning: Goal Shaping for Efficient Task-oriented Dialogue Policy Learning
Publication date
2024-05-06
Editors
Advisors
Supervisors
DOI
Document Type
/dk/atira/pure/researchoutput/researchoutputtypes/contributiontojournal/conferencearticle
Metadata
Show full item recordCollections
License
cc_by
Abstract
Reinforcement Learning (RL) shows promise in optimizing task-oriented dialogue policies, but addressing the challenge of reward sparsity remains challenging. Curriculum learning offers an effective solution by strategically training dialogue policies from simple to complex, facilitating a smooth knowledge transition across varied goal complexities. However, these methods typically assume that goal difficulty will increase gradually to adapt to difficult goals over time. In complex environments lacking intermediate goals, attaining smooth knowledge transitions becomes tricky. This paper proposes a novel Bootstrapped Policy Learning (BPL) framework that adaptively tailors a curriculum for each complex goal through goal shaping, which consists of progressively challenging subgoals. Goal shaping comprises goal decomposition and evolution, breaking complex goals into solvable subgoals and progressively increasing subgoal difficulty as the policy improves. BPL harmoniously combines these aspects, enabling smooth knowledge transitions from simple to complex goals, thereby enhancing task-oriented dialogue policy learning efficiency. Our experiments demonstrate the effectiveness of BPL in two complex dialogue environments.
Keywords
Curriculum Learning, Dialogue Policy, Goal Shaping, Reinforcement Learning, Artificial Intelligence, Software, Control and Systems Engineering
Citation
Zhao, Y, Dastani, M & Wang, S 2024, 'Bootstrapped Policy Learning : Goal Shaping for Efficient Task-oriented Dialogue Policy Learning', Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 2024, no. May, pp. 2615-2617. < https://dl.acm.org/doi/10.5555/3635637.3663245 >