Assessing the Reliability of Word Embedding Gender Bias Measures

Du, Yupei; Fang, Qixiang; Nguyen, Dong

doi:https://doi.org/10.48550/arXiv.2109.04732

Assessing the Reliability of Word Embedding Gender Bias Measures

Files

2109.04732v1.pdf (1.62 MB)

Publication date

2021-09-10

Authors

Du, Yupei

Fang, Qixiang

Nguyen, Dong

DOI

https://doi.org/10.48550/arXiv.2109.04732

Document Type

/dk/atira/pure/researchoutput/researchoutputtypes/workingpaper/preprint

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

Various measures have been proposed to quantify human-like social biases in word embeddings. However, bias scores based on these measures can suffer from measurement error. One indication of measurement quality is reliability, concerning the extent to which a measure produces consistent results. In this paper, we assess three types of reliability of word embedding gender bias measures, namely test-retest reliability, inter-rater consistency and internal consistency. Specifically, we investigate the consistency of bias scores across different choices of random seeds, scoring rules and words. Furthermore, we analyse the effects of various factors on these measures' reliability scores. Our findings inform better design of word embedding gender bias measures. Moreover, we urge researchers to be more critical about the application of such measures.

Keywords

cs.CL, SDG 5 - Gender Equality

Citation

Du, Y, Fang, Q & Nguyen, D 2021 'Assessing the Reliability of Word Embedding Gender Bias Measures' arXiv, pp. 1-23. https://doi.org/10.48550/arXiv.2109.04732

URI

https://dspace.library.uu.nl/handle/1874/415132

Assessing the Reliability of Word Embedding Gender Bias Measures

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI