Soft metrics for evaluation with disagreements: an assessment

Publication date

2024-05-21

Authors

Rizzi, Giulia
Leonardelli, Elisa
Poesio, MassimoORCID 0000-0001-8469-2072ISNI 0000000124478066
Uma, Alexandra
Pavlovic, Maja
Paun, Silviu
Rosso, Paolo
Fersini, Elisabetta

Editors

Abercrombie, Gavin
Basile, Valerio
Bernardi, Davide
Dudy, Shiran
Frenda, Simona
Havens, Lucy
Tonelli, Sara

Advisors

Supervisors

DOI

Document Type

Part of book
Open Access logo

License

cc_by_nc

Abstract

The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.

Keywords

Language and Linguistics, Education, Library and Information Sciences, Linguistics and Language

Citation

Rizzi, G, Leonardelli, E, Poesio, M, Uma, A, Pavlovic, M, Paun, S, Rosso, P & Fersini, E 2024, Soft metrics for evaluation with disagreements : an assessment. in G Abercrombie, V Basile, D Bernardi, S Dudy, S Frenda, L Havens & S Tonelli (eds), 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings. 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings, European Language Resources Association (ELRA), pp. 84-94, 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024, Torino, Italy, 21/05/24. < https://aclanthology.org/2024.nlperspectives-1.9 >, conference