Soft metrics for evaluation with disagreements: an assessment
Publication date
2024-05-21
Editors
Abercrombie, Gavin
Basile, Valerio
Bernardi, Davide
Dudy, Shiran
Frenda, Simona
Havens, Lucy
Tonelli, Sara
Advisors
Supervisors
DOI
Document Type
Part of book
Metadata
Show full item recordCollections
License
cc_by_nc
Abstract
The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.
Keywords
Language and Linguistics, Education, Library and Information Sciences, Linguistics and Language
Citation
Rizzi, G, Leonardelli, E, Poesio, M, Uma, A, Pavlovic, M, Paun, S, Rosso, P & Fersini, E 2024, Soft metrics for evaluation with disagreements : an assessment. in G Abercrombie, V Basile, D Bernardi, S Dudy, S Frenda, L Havens & S Tonelli (eds), 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings. 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings, European Language Resources Association (ELRA), pp. 84-94, 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024, Torino, Italy, 21/05/24. < https://aclanthology.org/2024.nlperspectives-1.9 >, conference