On learning soccer strategies
Publication date
1997
Authors
Salustowicz, R.
Wiering, M.A.
Schmidhuber, J.
Editors
Advisors
Supervisors
DOI
Document Type
Article in proceedings
Metadata
Show full item recordCollections
License
Abstract
We use simulated soccer to study multiagent learning. Each team's
players (agents) share action set and policy but may behave differently
due to position-dependent inputs. All agents making up a team are
rewarded or punished collectively in case of goals. We conduct
simulations with varying team sizes, and compare two learning
algorithms: TD-Q learning with linear neural networks (TD-Q) and
Probabilistic Incremental Program Evolution (PIPE). TD-Q is based on
evaluation functions (EFs) mapping input/action pairs to expected
reward, while PIPE searches policy space directly. PIPE uses an adaptive
probability distribution to synthesize programs that calculate action
probabilities from current inputs. Our results show that TD-Q has
difficulties to learn appropriate shared EFs. PIPE, however, does not
depend on EFs and finds good policies faster and more reliably.