Influence of Gaussian distribution on performance metrics in continuous reinforcement learning
Publication date
2026-03
Editors
Advisors
Supervisors
Document Type
Article
Metadata
Show full item recordCollections
License
taverne
Abstract
We investigate the influence of the entropy of the Gaussian distribution (Formula presented) through a case study on Proximal Policy Optimization (PPO). Specifically, the study investigates the impact of (Formula presented) in the initial distribution and training distribution, and we discover that the change of (Formula presented) directly affects the exploration performance and sample efficiency of the algorithm. Based on the aforementioned observation, we propose a two-stage training method that we first train the variance independently and subsequently utilize the trained variance to train the mean. Based on the theoretical perspective of treating reinforcement learning as a probabilistic inference problem, the effectiveness of this approach is theoretically justified. We employ a two-stage training method for PPO to obtain Positive-Proximal Policy Optimization (P-PPO). We test it on over 2000 agents across four different types of MuJoCo benchmarks, which are provided by a physical engine that serves as a simulator for control systems. Our results demonstrate that, compared with PPO, P-PPO achieves significant improvements in several key performance metrics: sampling efficiency improves by 111%, exploration performance increases by 137%, and robustness is enhanced by 46% and 80% in the walker2d and halfCheetah environments, respectively. Additionally, delayed gratification is increased by 223%, 12%, 4%, and 90% in the swimmer, hopper, walker2d, and halfCheetah environments, respectively.
Keywords
Continuous reinforcement learning, Control systems, Entropy, Gaussian distribution, Taverne, Information Systems, Media Technology, Computer Science Applications, Management Science and Operations Research, Library and Information Sciences
Citation
Zhou, R, Zhong, T, Zhu, W, Han, S & Lü, S 2026, 'Influence of Gaussian distribution on performance metrics in continuous reinforcement learning', Information Processing and Management, vol. 63, no. 2, 104428. https://doi.org/10.1016/j.ipm.2025.104428