Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Publication date

2023-09-07

Authors

Varricchione, GiovanniORCID 0000-0002-5466-9012ISNI 0000000527856455
Alechina, NatashaORCID 0000-0003-3306-9891ISNI 0000000124421545
Dastani, MehdiISNI 0000000043464658
Logan, BrianORCID 0000-0003-0648-7107ISNI 0000000124462996

Editors

Malvone, Vadim
Murano, Aniello

Advisors

Supervisors

Document Type

Part of book
Open Access logo

License

taverne

Abstract

Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.

Keywords

multi-agent reinforcement learning, reward machines, automatic synthesis, Taverne

Citation

Varricchione, G, Alechina, N, Dastani, M & Logan, B 2023, Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning. in V Malvone & A Murano (eds), Multi-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings : 20th European Conference, EUMAS 2023, Naples, Italy, September 14–15, 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14282 LNAI, pp. 328–344. https://doi.org/10.1007/978-3-031-43264-4_21