Multi-objective reinforcement learning for provably incentivising alignment with value systems

Rodriguez-Soto, Manel; Rădulescu, Roxana; Bistaffa, Filippo; Ricart, Oriol; Mayoral-Macau, Arnau; Lopez-Sanchez, Maite; Rodriguez-Aguilar, Juan A.; Nowé, Ann

doi:https://doi.org/10.1016/j.artint.2025.104460

Multi-objective reinforcement learning for provably incentivising alignment with value systems

Files

1-s2.0-S0004370225001791-main.pdf (4.58 MB)

Publication date

2026-02

Authors

Rodriguez-Soto, Manel

Rădulescu, Roxana

Bistaffa, Filippo

Ricart, Oriol

Mayoral-Macau, Arnau

Lopez-Sanchez, Maite

Rodriguez-Aguilar, Juan A.

Nowé, Ann

DOI

https://doi.org/10.1016/j.artint.2025.104460

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

This paper addresses the problem of ensuring that autonomous learning agents align with multiple moral values. Specifically, we present the theoretical principles and algorithmic tools necessary for creating an environment where we ensure that the agent learns a behaviour aligned with multiple moral values while striving to achieve its individual objective. To address this value alignment problem, we adopt the Multi-Objective Reinforcement Learning framework and propose a novel algorithm that combines techniques from Multi-Objective Reinforcement Learning and Linear Programming. In addition, we illustrate our value alignment process with an example involving an autonomous vehicle. Here, we demonstrate that the agent learns to behave in alignment with the ethical values of safety, achievement, and comfort, with achievement representing the agent's individual objective. Such ethical behaviour differs depending on the ordering between values. We also use a synthetic multi-objective environment to evaluate the computational costs of guaranteeing ethical learning as the number of values increases.

Keywords

Ethics, Multi-objective reinforcement learning, Value alignment, Language and Linguistics, Linguistics and Language, Artificial Intelligence

Citation

Rodriguez-Soto, M, Rădulescu, R, Bistaffa, F, Ricart, O, Mayoral-Macau, A, Lopez-Sanchez, M, Rodriguez-Aguilar, J A & Nowé, A 2026, 'Multi-objective reinforcement learning for provably incentivising alignment with value systems', Artificial Intelligence, vol. 351, 104460. https://doi.org/10.1016/j.artint.2025.104460

URI

https://dspace.library.uu.nl/handle/1874/479998

Multi-objective reinforcement learning for provably incentivising alignment with value systems

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI