Direct and Indirect Annotation with Generative AI: A Case Study into Finding Animals and Plants in Historical Text
Files
Publication date
2024-11-18
Editors
Advisors
Supervisors
DOI
Document Type
Part of book
Metadata
Show full item recordCollections
License
cc_by
Abstract
This study explores the use of generative AI (GenAI) for annotation in the humanities, comparing direct and indirect annotation approaches with human annotations. Direct annotation involves using GenAI to annotate the entire corpus, while indirect annotation uses GenAI to create training data for a specialized model. The research investigates zero-shot and few-shot methods for direct annotation, alongside an indirect approach incorporating active learning, few-shotting, and k-NN example retrieval. The task focuses on identifying words (also referred to as entities) related to plants and animals in Early Modern Dutch texts. Results show that indirect annotation outperforms zero-shot direct annotation in mimicking human annotations. However, with just a few examples, direct annotation catches up, achieving similar performance to indirect annotation. Analysis of confusion matrices reveals that GenAI annotators make similar types of mistakes, such as confusing parts and products or failing to identify entities, which are broader than those made by humans. Manual error analysis indicates that each annotation method (human, direct, and indirect) has some unique errors. Given the limited scale of this study, it is worthwhile to further explore the relative affordances of direct and indirect GenAI annotation methods.
Keywords
Citation
van Dalfsen, A, Karsdorp, F, Bagheri, A, Stronks, E, Mentink, D & van Engelen, T 2024, Direct and Indirect Annotation with Generative AI : A Case Study into Finding Animals and Plants in Historical Text. in Computational Humanities Research 2024 : Proceedings of the Computational Humanities Research Conference 2024 Aarhus, Denmark, December 4-6, 2024.. CEUR WS, pp. 1053-1074. < https://ceur-ws.org/Vol-3834/ >