Using Chao’s Estimator as a Stopping Criterion for Technology-Assisted Review

Publication date

2025-05-31

Authors

Bron, Michiel P.ORCID 0000-0002-4823-6085
van der Heijden, PeterISNI 0000000067738801
Feelders, AdISNI 0000000350720316
Siebes, ArnoISNI 0000000114727321

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

Technology-Assisted Review aims to reduce the human effort required for screening processes such as abstract screening for Systematic Literature Reviews. Human reviewers label documents as relevant or irrelevant during this process, while the system incrementally updates a prediction model based on the reviewers’ previous decisions. After each model update, the system proposes new documents it deems relevant, to prioritize relevant documents over irrelevant ones. A stopping criterion is necessary to guide users in stopping the review process to minimize the number of missed relevant documents and the number of read irrelevant documents. In this article, we propose and evaluate a new ensemble-based Active Learning strategy and a stopping criterion based on Chao’s Population Size Estimator that estimates the prevalence of relevant documents in the dataset. Our simulation study demonstrates that this criterion performs well on several datasets and is compared to other methods presented in the literature.

Keywords

active learning, datasets, information retrieval, machine learning, population size estimation, stopping criteria, technology-assisted review, Information Systems, General Business,Management and Accounting, Computer Science Applications

Citation

Bron, M, van der Heijden, P G M, Feelders, A & Siebes, A 2025, 'Using Chao’s Estimator as a Stopping Criterion for Technology-Assisted Review', ACM Transactions on Information Systems, vol. 43, no. 3, 81, pp. 1-51. https://doi.org/10.1145/3724116