TATRC: Triple Actor–Critic Structure with Regularization for better performance

Publication date

2026-03

Authors

Zhong, Taihong
Han, ShuaiISNI 0000000523493781
Zhang, Yushu
Long, Zehong
Lü, Shuai
Wu, Junhong

Editors

Advisors

Supervisors

Document Type

Article

License

taverne

Abstract

Current online reinforcement learning algorithms encounter over- or underestimation as a result of accumulated Bellman errors. Mainstream value estimation methods such as DDPG and TD3 fail to effectively control such estimation bias. Meanwhile, the estimation bias affects the update of the actor, leading to instability in its update direction and ultimately causing the instability of the algorithm. This paper aims to reduce the value estimation bias of the algorithm while enhancing its exploration capability and stability. Firstly, we introduce triple actors to achieve preliminary bias control and strengthen the exploration ability of the algorithm. Secondly, within the framework of triple actors, we assign weights to both the double critics and single critic structures based on the characteristics of different tasks. This ensemble approach further reduces value estimation bias, ensures more stable updates of triple critics, thereby effectively lowering algorithm variance and improving stability. Finally, we incorporate value regularization into each critic network to mitigate significant discrepancies between critics, which in turn enhances algorithm stability. Through regularization, we ensure that all critic networks remain close to one another and converge toward target value, further reducing bias and providing support for a more stable learning process. By combining the aforementioned methods, we propose the Triple Actor–Critic Structure with Regularization (TATRC). Experimental results on six tasks from the MuJoCo benchmark demonstrate that TATRC outperforms baseline algorithms significantly: it achieves a reward improvement of over 30% on Walker2d-v2 and Ant-v2 tasks, with an average reward increase of more than 20% across all six tasks. These results fully validate the effectiveness of TATRC in bias control and algorithm stability, and the algorithm achieves state-of-the-art performance across six tasks.

Keywords

Actor–critic, Exploration, Regularization, Reinforcement learning, Value bias, Taverne, Information Systems, Media Technology, Computer Science Applications, Management Science and Operations Research, Library and Information Sciences

Citation

Zhong, T, Han, S, Zhang, Y, Long, Z, Lü, S & Wu, J 2026, 'TATRC : Triple Actor–Critic Structure with Regularization for better performance', Information Processing and Management, vol. 63, no. 2 pt. B, 104452. https://doi.org/10.1016/j.ipm.2025.104452