Bootstrapped Policy Learning: Goal Shaping for Efficient Task-oriented Dialogue Policy Learning

Zhao, Yangyang; Dastani, Mehdi; Wang, Shihan

Bootstrapped Policy Learning: Goal Shaping for Efficient Task-oriented Dialogue Policy Learning

Files

3635637.3663245.pdf (870 KB)

Publication date

2024-05-06

Authors

Zhao, Yangyang

Dastani, Mehdi

Wang, Shihan

Document Type

/dk/atira/pure/researchoutput/researchoutputtypes/contributiontojournal/conferencearticle

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

Reinforcement Learning (RL) shows promise in optimizing task-oriented dialogue policies, but addressing the challenge of reward sparsity remains challenging. Curriculum learning offers an effective solution by strategically training dialogue policies from simple to complex, facilitating a smooth knowledge transition across varied goal complexities. However, these methods typically assume that goal difficulty will increase gradually to adapt to difficult goals over time. In complex environments lacking intermediate goals, attaining smooth knowledge transitions becomes tricky. This paper proposes a novel Bootstrapped Policy Learning (BPL) framework that adaptively tailors a curriculum for each complex goal through goal shaping, which consists of progressively challenging subgoals. Goal shaping comprises goal decomposition and evolution, breaking complex goals into solvable subgoals and progressively increasing subgoal difficulty as the policy improves. BPL harmoniously combines these aspects, enabling smooth knowledge transitions from simple to complex goals, thereby enhancing task-oriented dialogue policy learning efficiency. Our experiments demonstrate the effectiveness of BPL in two complex dialogue environments.

Keywords

Curriculum Learning, Dialogue Policy, Goal Shaping, Reinforcement Learning, Artificial Intelligence, Software, Control and Systems Engineering

Citation

Zhao, Y, Dastani, M & Wang, S 2024, 'Bootstrapped Policy Learning : Goal Shaping for Efficient Task-oriented Dialogue Policy Learning', Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 2024, no. May, pp. 2615-2617. < https://dl.acm.org/doi/10.5555/3635637.3663245 >

URI

https://dspace.library.uu.nl/handle/1874/473865

Bootstrapped Policy Learning: Goal Shaping for Efficient Task-oriented Dialogue Policy Learning

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI