Large-scale simulation study of active learning models for systematic reviews

Publication date

2025

Authors

Teijema, Jelle JasperISNI 0000000507449721
Bruin, Jonathan deORCID 0000-0002-4297-0502ISNI 000000051803672X
Bagheri, AyoubORCID 0000-0001-6366-2173ISNI 0000000492835784
Schoot, Rens van deISNI 0000000393562696

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by_nc_nd

Abstract

Despite progress in active learning, evaluation remains limited by constraints in simulation size, infrastructure, and dataset availability. This study advocates for large-scale simulations as the gold standard for evaluating active learning models in systematic review screening. Two large-scale simulations, totaling over 29 thousand runs, assessed active learning solutions. The first study evaluated 13 combinations of classification models and feature extraction techniques using high-quality datasets from the SYNERGY dataset. The second expanded this to 92 model combinations with additional classifiers and feature extractors. In every scenario tested, active learning outperformed random screening. The performance gained varied across datasets, models, and screening progression, ranging from considerable to near-flawless results. The findings demonstrate that active learning consistently outperforms random screening in systematic review tasks, offering significant efficiency gains. While the extent of improvement varies depending on the dataset, model choice, and screening stage, the overall advantage is clear. Since model performance differs, active learning systems should remain adaptable to accommodate new classifiers and feature extraction techniques. The publicly available results underscore the importance of open benchmarking to ensure reproducibility and the development of robust, generalizable active learning strategies.

Keywords

Active learning, Large-scale simulation, Screening phase, Systematic review

Citation

Teijema, J J, de Bruin, J, Bagheri, A & van de Schoot, R 2025, 'Large-scale simulation study of active learning models for systematic reviews', International Journal of Data Science and Analytics, vol. 20, no. 6, pp. 5435-5456. https://doi.org/10.1007/s41060-025-00777-0