TATRC: Triple Actor–Critic Structure with Regularization for better performance
Publication date
2026-03
Editors
Advisors
Supervisors
Document Type
Article
Metadata
Show full item recordCollections
License
taverne
Abstract
Current online reinforcement learning algorithms encounter over- or underestimation as a result of accumulated Bellman errors. Mainstream value estimation methods such as DDPG and TD3 fail to effectively control such estimation bias. Meanwhile, the estimation bias affects the update of the actor, leading to instability in its update direction and ultimately causing the instability of the algorithm. This paper aims to reduce the value estimation bias of the algorithm while enhancing its exploration capability and stability. Firstly, we introduce triple actors to achieve preliminary bias control and strengthen the exploration ability of the algorithm. Secondly, within the framework of triple actors, we assign weights to both the double critics and single critic structures based on the characteristics of different tasks. This ensemble approach further reduces value estimation bias, ensures more stable updates of triple critics, thereby effectively lowering algorithm variance and improving stability. Finally, we incorporate value regularization into each critic network to mitigate significant discrepancies between critics, which in turn enhances algorithm stability. Through regularization, we ensure that all critic networks remain close to one another and converge toward target value, further reducing bias and providing support for a more stable learning process. By combining the aforementioned methods, we propose the Triple Actor–Critic Structure with Regularization (TATRC). Experimental results on six tasks from the MuJoCo benchmark demonstrate that TATRC outperforms baseline algorithms significantly: it achieves a reward improvement of over 30% on Walker2d-v2 and Ant-v2 tasks, with an average reward increase of more than 20% across all six tasks. These results fully validate the effectiveness of TATRC in bias control and algorithm stability, and the algorithm achieves state-of-the-art performance across six tasks.
Keywords
Actor–critic, Exploration, Regularization, Reinforcement learning, Value bias, Taverne, Information Systems, Media Technology, Computer Science Applications, Management Science and Operations Research, Library and Information Sciences
Citation
Zhong, T, Han, S, Zhang, Y, Long, Z, Lü, S & Wu, J 2026, 'TATRC : Triple Actor–Critic Structure with Regularization for better performance', Information Processing and Management, vol. 63, no. 2 pt. B, 104452. https://doi.org/10.1016/j.ipm.2025.104452