Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets

Peinelt, Nicole; Liakata, Maria; Nguyen, Dong

doi:https://doi.org/10.18653/v1/P19-1268

Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets

Files

P19_1268.pdf (369.21 KB)

Publication date

2019-07-28

Authors

Peinelt, Nicole

Liakata, Maria

Nguyen, Dong

DOI

https://doi.org/10.18653/v1/P19-1268

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

Existing datasets for scoring text pairs in terms of semantic similarity contain instances whose resolution differs according to the degree of difficulty. This paper proposes to distinguish obvious from non-obvious text pairs based on superficial lexical overlap and ground-truth labels. We characterise existing datasets in terms of containing difficult cases and find that recently proposed models struggle to capture the non-obvious cases of semantic similarity. We describe metrics that emphasise cases of similarity which require more complex inference and propose that these are used for evaluating systems for semantic similarity.

Citation

Peinelt, N, Liakata, M & Nguyen, D 2019, Aiming beyond the Obvious : Identifying Non-Obvious Cases in Semantic Similarity Datasets. in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp. 2792-2798. https://doi.org/10.18653/v1/P19-1268

URI

https://dspace.library.uu.nl/handle/1874/389909

Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI