Combining Large Language Model Classifications and Active Learning for Improved Technology-Assisted Review

Bron, Michiel P.; Greijn, Berend; Coimbra, Bruno Messina; van de Schoot, Rens; Bagheri, Ayoub

Combining Large Language Model Classifications and Active Learning for Improved Technology-Assisted Review

Files

IJNS-12-00001.pdf (985.19 KB)

Publication date

2024-09-30

Authors

Bron, Michiel Pieter

Greijn, Berend

Messina Coimbra, Bruno

Schoot, Rens van de

Bagheri, Ayoub

Document Type

/dk/atira/pure/researchoutput/researchoutputtypes/contributiontojournal/conferencearticle

Metadata

Show full item record

Collections

UMC Repository

License

cc_by

Abstract

Technology-assisted review (TAR) is software that aids in high-recall information retrieval tasks, such as abstract screening for systematic literature reviews. Often, TAR systems use a form of Active Learning (AL); during this process, human reviewers label documents as relevant or irrelevant according to a screening protocol, while the system incrementally updates a classifier based on the reviewers’ previous decisions. After each model update, the system uses the classifier to rerank the remaining workload by prioritizing predicted relevant documents over irrelevant ones, enabling a reduced workload. Recently, studies have been performed that study the ability of solely using Large Language Models (LLMs) to perform this task by supplying the LLM prompts that contain the task, screening protocol, and a document from the corpus. The LLM then provides a classification of the document in question. While the results of these studies are promising, the LLM’s predictions are not error-free, resulting in a recall or precision that is lower than desired. In this work, we propose a new Active Learning method for TAR that integrates the results of the LLM in the review process that may correct some of the shortcomings of the LLM results, leveraging a reduced workload with respect to current TAR systems.

Keywords

active learning, information retrieval, large language model, technology-assisted review, weak supervision, General Computer Science, SDG 3 - Good Health and Well-being

Citation

Bron, M P, Greijn, B, Coimbra, B M, van de Schoot, R & Bagheri, A 2024, 'Combining Large Language Model Classifications and Active Learning for Improved Technology-Assisted Review', CEUR Workshop Proceedings, vol. 3770, pp. 77-95.

URI

https://dspace.library.uu.nl/handle/1874/469195

Combining Large Language Model Classifications and Active Learning for Improved Technology-Assisted Review

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI