Speeding Up Q(lambda)-learning
Publication date
1998
Authors
Wiering, M.A.
Schmidhuber, J.
Editors
Advisors
Supervisors
DOI
Document Type
Article in proceedings
Metadata
Show full item recordCollections
License
Abstract
Q(lambda)-learning uses TD(lambda)-methods to accelerate Q learning. The worst case complexity for a single update step of previous online Q(lambda)implementations based on lookup tables is bounded by the size of the state action space. Our faster algorithm's worst case complexity is
bounded by the number of actions. The algorithm is based on the observation that Q
value updates may be postponed until they are needed.
Keywords
Reinforcement learning, Q-learning, TD(lambda), online Q(lambda), lazy learning