Direct and Indirect Annotation with Generative AI: A Case Study into Finding Animals and Plants in Historical Text

Publication date

2024-11-18

Authors

van Dalfsen, ArjanORCID 0000-0002-4209-4063
Karsdorp, Folgert
Bagheri, AyoubORCID 0000-0001-6366-2173ISNI 0000000492835784
Stronks, ElsISNI 0000000110516340
Mentink, Dieuwertje
van Engelen, Thirza

Editors

Advisors

Supervisors

DOI

Document Type

Part of book
Open Access logo

License

cc_by

Abstract

This study explores the use of generative AI (GenAI) for annotation in the humanities, comparing direct and indirect annotation approaches with human annotations. Direct annotation involves using GenAI to annotate the entire corpus, while indirect annotation uses GenAI to create training data for a specialized model. The research investigates zero-shot and few-shot methods for direct annotation, alongside an indirect approach incorporating active learning, few-shotting, and k-NN example retrieval. The task focuses on identifying words (also referred to as entities) related to plants and animals in Early Modern Dutch texts. Results show that indirect annotation outperforms zero-shot direct annotation in mimicking human annotations. However, with just a few examples, direct annotation catches up, achieving similar performance to indirect annotation. Analysis of confusion matrices reveals that GenAI annotators make similar types of mistakes, such as confusing parts and products or failing to identify entities, which are broader than those made by humans. Manual error analysis indicates that each annotation method (human, direct, and indirect) has some unique errors. Given the limited scale of this study, it is worthwhile to further explore the relative affordances of direct and indirect GenAI annotation methods.

Keywords

Citation

van Dalfsen, A, Karsdorp, F, Bagheri, A, Stronks, E, Mentink, D & van Engelen, T 2024, Direct and Indirect Annotation with Generative AI : A Case Study into Finding Animals and Plants in Historical Text. in Computational Humanities Research 2024 : Proceedings of the Computational Humanities Research Conference 2024 Aarhus, Denmark, December 4-6, 2024.. CEUR WS, pp. 1053-1074. < https://ceur-ws.org/Vol-3834/ >