TATRC: Triple Actor–Critic Structure with Regularization for better performance

Zhong, Taihong; Han, Shuai; Zhang, Yushu; Long, Zehong; Lü, Shuai; Wu, Junhong

doi:https://doi.org/10.1016/j.ipm.2025.104452

TATRC: Triple Actor–Critic Structure with Regularization for better performance

Files

1-s2.0-S0306457325003930-main.pdf (5.76 MB)

Publication date

2026-03

Authors

Zhong, Taihong

Han, Shuai

Zhang, Yushu

Long, Zehong

Lü, Shuai

Wu, Junhong

DOI

https://doi.org/10.1016/j.ipm.2025.104452

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

taverne

Abstract

Current online reinforcement learning algorithms encounter over- or underestimation as a result of accumulated Bellman errors. Mainstream value estimation methods such as DDPG and TD3 fail to effectively control such estimation bias. Meanwhile, the estimation bias affects the update of the actor, leading to instability in its update direction and ultimately causing the instability of the algorithm. This paper aims to reduce the value estimation bias of the algorithm while enhancing its exploration capability and stability. Firstly, we introduce triple actors to achieve preliminary bias control and strengthen the exploration ability of the algorithm. Secondly, within the framework of triple actors, we assign weights to both the double critics and single critic structures based on the characteristics of different tasks. This ensemble approach further reduces value estimation bias, ensures more stable updates of triple critics, thereby effectively lowering algorithm variance and improving stability. Finally, we incorporate value regularization into each critic network to mitigate significant discrepancies between critics, which in turn enhances algorithm stability. Through regularization, we ensure that all critic networks remain close to one another and converge toward target value, further reducing bias and providing support for a more stable learning process. By combining the aforementioned methods, we propose the Triple Actor–Critic Structure with Regularization (TATRC). Experimental results on six tasks from the MuJoCo benchmark demonstrate that TATRC outperforms baseline algorithms significantly: it achieves a reward improvement of over 30% on Walker2d-v2 and Ant-v2 tasks, with an average reward increase of more than 20% across all six tasks. These results fully validate the effectiveness of TATRC in bias control and algorithm stability, and the algorithm achieves state-of-the-art performance across six tasks.

Keywords

Actor–critic, Exploration, Regularization, Reinforcement learning, Value bias, Taverne, Information Systems, Media Technology, Computer Science Applications, Management Science and Operations Research, Library and Information Sciences

Citation

Zhong, T, Han, S, Zhang, Y, Long, Z, Lü, S & Wu, J 2026, 'TATRC : Triple Actor–Critic Structure with Regularization for better performance', Information Processing and Management, vol. 63, no. 2 pt. B, 104452. https://doi.org/10.1016/j.ipm.2025.104452

URI

https://dspace.library.uu.nl/handle/1874/479448

TATRC: Triple Actor–Critic Structure with Regularization for better performance

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI