Words Matter: A Computational Toolkit For Charged Terms
Publication date
2025-07-10
Authors
Brate, Ryan James
Editors
Advisors
Supervisors
van den Bosch, Antal
Hollink, L.
Marieke van Erp, M.G.J.
Document Type
Dissertation
Metadata
Show full item recordCollections
License
No license information available
Abstract
This thesis investigates word-level biases, employing computational linguistics methods to support decolonisation efforts within cultural heritage institutions. Museum catalogues often contain contested terminology shaped by colonial legacies. The identification and retrospective handling of such word-level biases and the negative biases potentially propagated by such terms, is a key activity in current decolonisation initiatives of museum institutions in the Western world. The research develops and demonstrates the utility of computational methods for detecting and analysing the biases of contested and potentially contested terms, with the goal of providing interpretable insights to heritage professionals. Through a series of studies spanning historical newspapers, literary fiction, and social media, the thesis proposes methodologies and supporting pipelines, which identify key behaviours, attributes, received behaviours, and linguistic markers of known problematic terms as core vectors for social biasing for interpretation. Outcomes are shown to align well with known biases of well-recognised problematic terminology. In addition to surface level context features, the research explores proxy signals for prejudicial narratives, specifically offering empirical support for the phenomenon of aporophobia—disdain for poverty—by revealing the disproportionate association of low socio-economic contexts with negatively connoted topics. Additionally, the thesis introduces the ConConCor dataset—multi-sentence contexts annotated for offensiveness—offering a foundation for future studies into subjective judgments of harm in contested language. Overall, the research provides a methodological and conceptual framework for uncovering latent biases in cultural data, equipping institutions with tools to help facilitate decolonisation efforts.
Keywords
NLP, language models, linguistic variations, sociolinguistics, corpus linguistics, structural causal modelling, decolonisation, context analysis, NLP, language models, linguistic variations, sociolinguistics, corpus linguistics, structural causal modelling, decolonisation, context analysis
Citation
Brate, R J 2025, 'Words Matter : A Computational Toolkit For Charged Terms', Doctor of Philosophy, Universiteit Utrecht. https://doi.org/10.33540/2914