Using Chao’s Estimator as a Stopping Criterion for Technology-Assisted Review
Files
Publication date
2025-05-31
Editors
Advisors
Supervisors
Document Type
Article
Metadata
Show full item recordCollections
License
cc_by
Abstract
Technology-Assisted Review aims to reduce the human effort required for screening processes such as abstract screening for Systematic Literature Reviews. Human reviewers label documents as relevant or irrelevant during this process, while the system incrementally updates a prediction model based on the reviewers’ previous decisions. After each model update, the system proposes new documents it deems relevant, to prioritize relevant documents over irrelevant ones. A stopping criterion is necessary to guide users in stopping the review process to minimize the number of missed relevant documents and the number of read irrelevant documents. In this article, we propose and evaluate a new ensemble-based Active Learning strategy and a stopping criterion based on Chao’s Population Size Estimator that estimates the prevalence of relevant documents in the dataset. Our simulation study demonstrates that this criterion performs well on several datasets and is compared to other methods presented in the literature.
Keywords
active learning, datasets, information retrieval, machine learning, population size estimation, stopping criteria, technology-assisted review, Information Systems, General Business,Management and Accounting, Computer Science Applications
Citation
Bron, M, van der Heijden, P G M, Feelders, A & Siebes, A 2025, 'Using Chao’s Estimator as a Stopping Criterion for Technology-Assisted Review', ACM Transactions on Information Systems, vol. 43, no. 3, 81, pp. 1-51. https://doi.org/10.1145/3724116