Introducing CAD: the Contextual Abuse Dataset

Vidgen, Bertie; Nguyen, Dong; Margetts, Helen; Rossini, Patricia; Tromble, Rebekah

doi:https://doi.org/10.18653/v1/2021.naacl-main.182

Introducing CAD: the Contextual Abuse Dataset

Files

2021.naacl_main.182.pdf (310.9 KB)

Publication date

2021-06

Authors

Vidgen, Bertie

Nguyen, Dong

Margetts, Helen

Rossini, Patricia

Tromble, Rebekah

Editors

Toutanova, Kristina

Rumshisky, Anna

Zettlemoyer, Luke

Hakkani-Tur, Dilek

Beltagy, Iz

Bethard, Steven

Cotterell, Ryan

Chakraborty, Tanmoy

Zhou, Yichao

DOI

https://doi.org/10.18653/v1/2021.naacl-main.182

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic. Progress in automatically detecting and classifying abusive content is often held back by the lack of high quality and detailed datasets.We introduce a new dataset of primarily English Reddit entries which addresses several limitations of prior work. It (1) contains six conceptually distinct primary categories as well as secondary categories, (2) has labels annotated in the context of the conversation thread, (3) contains rationales and (4) uses an expert-driven group-adjudication process for high quality annotations. We report several baseline models to benchmark the work of future researchers. The annotated dataset, annotation guidelines, models and code are freely available.

Citation

Vidgen, B, Nguyen, D, Margetts, H, Rossini, P & Tromble, R 2021, Introducing CAD: the Contextual Abuse Dataset. in K Toutanova, A Rumshisky, L Zettlemoyer, D Hakkani-Tur, I Beltagy, S Bethard, R Cotterell, T Chakraborty & Y Zhou (eds), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp. 2289-2303. https://doi.org/10.18653/v1/2021.naacl-main.182

URI

https://dspace.library.uu.nl/handle/1874/415040

Introducing CAD: the Contextual Abuse Dataset

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI