Influence of Gaussian distribution on performance metrics in continuous reinforcement learning

Zhou, Ruikai; Zhong, Taihong; Zhu, Wenbo; Han, Shuai; Lü, Shuai

doi:https://doi.org/10.1016/j.ipm.2025.104428

Influence of Gaussian distribution on performance metrics in continuous reinforcement learning

Files

1-s2.0-S0306457325003693-main.pdf (9.42 MB)

Publication date

2026-03

Authors

Zhou, Ruikai

Zhong, Taihong

Zhu, Wenbo

Han, Shuai

Lü, Shuai

DOI

https://doi.org/10.1016/j.ipm.2025.104428

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

taverne

Abstract

We investigate the influence of the entropy of the Gaussian distribution (Formula presented) through a case study on Proximal Policy Optimization (PPO). Specifically, the study investigates the impact of (Formula presented) in the initial distribution and training distribution, and we discover that the change of (Formula presented) directly affects the exploration performance and sample efficiency of the algorithm. Based on the aforementioned observation, we propose a two-stage training method that we first train the variance independently and subsequently utilize the trained variance to train the mean. Based on the theoretical perspective of treating reinforcement learning as a probabilistic inference problem, the effectiveness of this approach is theoretically justified. We employ a two-stage training method for PPO to obtain Positive-Proximal Policy Optimization (P-PPO). We test it on over 2000 agents across four different types of MuJoCo benchmarks, which are provided by a physical engine that serves as a simulator for control systems. Our results demonstrate that, compared with PPO, P-PPO achieves significant improvements in several key performance metrics: sampling efficiency improves by 111%, exploration performance increases by 137%, and robustness is enhanced by 46% and 80% in the walker2d and halfCheetah environments, respectively. Additionally, delayed gratification is increased by 223%, 12%, 4%, and 90% in the swimmer, hopper, walker2d, and halfCheetah environments, respectively.

Keywords

Continuous reinforcement learning, Control systems, Entropy, Gaussian distribution, Taverne, Information Systems, Media Technology, Computer Science Applications, Management Science and Operations Research, Library and Information Sciences

Citation

Zhou, R, Zhong, T, Zhu, W, Han, S & Lü, S 2026, 'Influence of Gaussian distribution on performance metrics in continuous reinforcement learning', Information Processing and Management, vol. 63, no. 2, 104428. https://doi.org/10.1016/j.ipm.2025.104428

URI

https://dspace.library.uu.nl/handle/1874/478863

Influence of Gaussian distribution on performance metrics in continuous reinforcement learning

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI