Direct and Indirect Annotation with Generative AI: A Case Study into Finding Animals and Plants in Historical Text

van Dalfsen, Arjan; Karsdorp, Folgert; Bagheri, Ayoub; Stronks, Els; Mentink, Dieuwertje; van Engelen, Thirza

Direct and Indirect Annotation with Generative AI: A Case Study into Finding Animals and Plants in Historical Text

Files

paper74.pdf (6.3 MB)

Publication date

2024-11-18

Authors

van Dalfsen, Arjan

Karsdorp, Folgert

Bagheri, Ayoub

Stronks, Els

Mentink, Dieuwertje

van Engelen, Thirza

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

This study explores the use of generative AI (GenAI) for annotation in the humanities, comparing direct and indirect annotation approaches with human annotations. Direct annotation involves using GenAI to annotate the entire corpus, while indirect annotation uses GenAI to create training data for a specialized model. The research investigates zero-shot and few-shot methods for direct annotation, alongside an indirect approach incorporating active learning, few-shotting, and k-NN example retrieval. The task focuses on identifying words (also referred to as entities) related to plants and animals in Early Modern Dutch texts. Results show that indirect annotation outperforms zero-shot direct annotation in mimicking human annotations. However, with just a few examples, direct annotation catches up, achieving similar performance to indirect annotation. Analysis of confusion matrices reveals that GenAI annotators make similar types of mistakes, such as confusing parts and products or failing to identify entities, which are broader than those made by humans. Manual error analysis indicates that each annotation method (human, direct, and indirect) has some unique errors. Given the limited scale of this study, it is worthwhile to further explore the relative affordances of direct and indirect GenAI annotation methods.

Citation

van Dalfsen, A, Karsdorp, F, Bagheri, A, Stronks, E, Mentink, D & van Engelen, T 2024, Direct and Indirect Annotation with Generative AI : A Case Study into Finding Animals and Plants in Historical Text. in Computational Humanities Research 2024 : Proceedings of the Computational Humanities Research Conference 2024 Aarhus, Denmark, December 4-6, 2024.. CEUR WS, pp. 1053-1074. < https://ceur-ws.org/Vol-3834/ >

URI

https://dspace.library.uu.nl/handle/1874/482492

Direct and Indirect Annotation with Generative AI: A Case Study into Finding Animals and Plants in Historical Text

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI