Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning
Publication date
2023-09-07
Editors
Malvone, Vadim
Murano, Aniello
Advisors
Supervisors
Document Type
Part of book
Metadata
Show full item recordCollections
License
taverne
Abstract
Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.
Keywords
multi-agent reinforcement learning, reward machines, automatic synthesis, Taverne
Citation
Varricchione, G, Alechina, N, Dastani, M & Logan, B 2023, Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. in V Malvone & A Murano (eds), Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings : 20th European Conference, EUMAS 2023, Naples, Italy, September 14–15, 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14282 LNAI, pp. 328–344. https://doi.org/10.1007/978-3-031-43264-4_21