Speeding Up Q(lambda)-learning

Publication date

1998

Authors

Wiering, M.A.
Schmidhuber, J.

Editors

Advisors

Supervisors

DOI

Document Type

Article in proceedings
Open Access logo

License

Abstract

Q(lambda)-learning uses TD(lambda)-methods to accelerate Q learning. The worst case complexity for a single update step of previous online Q(lambda)implementations based on lookup tables is bounded by the size of the state action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q value updates may be postponed until they are needed.

Keywords

Reinforcement learning, Q-learning, TD(lambda), online Q(lambda), lazy learning

Citation