Words Matter: A Computational Toolkit For Charged Terms

Brate, Ryan James

doi:https://doi.org/10.33540/2914

Words Matter: A Computational Toolkit For Charged Terms

Publication date

2025-07-10

Authors

Brate, Ryan James

Supervisors

van den Bosch, Antal

Hollink, L.

Marieke van Erp, M.G.J.

DOI

https://doi.org/10.33540/2914

Document Type

Dissertation

Metadata

Show full item record

Collections

Utrecht University Repository

License

No license information available

Abstract

This thesis investigates word-level biases, employing computational linguistics methods to support decolonisation efforts within cultural heritage institutions. Museum catalogues often contain contested terminology shaped by colonial legacies. The identification and retrospective handling of such word-level biases and the negative biases potentially propagated by such terms, is a key activity in current decolonisation initiatives of museum institutions in the Western world. The research develops and demonstrates the utility of computational methods for detecting and analysing the biases of contested and potentially contested terms, with the goal of providing interpretable insights to heritage professionals. Through a series of studies spanning historical newspapers, literary fiction, and social media, the thesis proposes methodologies and supporting pipelines, which identify key behaviours, attributes, received behaviours, and linguistic markers of known problematic terms as core vectors for social biasing for interpretation. Outcomes are shown to align well with known biases of well-recognised problematic terminology. In addition to surface level context features, the research explores proxy signals for prejudicial narratives, specifically offering empirical support for the phenomenon of aporophobia—disdain for poverty—by revealing the disproportionate association of low socio-economic contexts with negatively connoted topics. Additionally, the thesis introduces the ConConCor dataset—multi-sentence contexts annotated for offensiveness—offering a foundation for future studies into subjective judgments of harm in contested language. Overall, the research provides a methodological and conceptual framework for uncovering latent biases in cultural data, equipping institutions with tools to help facilitate decolonisation efforts.

Keywords

NLP, language models, linguistic variations, sociolinguistics, corpus linguistics, structural causal modelling, decolonisation, context analysis, NLP, language models, linguistic variations, sociolinguistics, corpus linguistics, structural causal modelling, decolonisation, context analysis

Citation

Brate, R J 2025, 'Words Matter : A Computational Toolkit For Charged Terms', Doctor of Philosophy, Universiteit Utrecht. https://doi.org/10.33540/2914

URI

https://dspace.library.uu.nl/handle/1874/462827

Words Matter: A Computational Toolkit For Charged Terms

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI