The effect of measurement error on clustering

Publication date

2025-10

Authors

Pankowska, Paulina
Oberski, Daniel LeonardORCID 0000-0001-7467-2297ISNI 0000000396652603
Garnier-Villarreal, Mauricio
Pavlopoulos, Dimitris

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

Clustering is a set of statistical techniques widely applied in the social sciences. While an important and useful tool, traditional clustering techniques tend to assume that the data are free from measurement error, which is often an unrealistic assumption. In this paper, we perform a Monte Carlo study to investigate the sensitivity of different clustering techniques to measurement error. We focus on three commonly used approaches: latent profile analysis (LPA), hierarchical clustering using Ward’s method, and k-means. We examine how the error affects the interpretability of the clusters and the classification of observations into clusters. Our results indicate that LPA fares better in the presence of error. In fact, clustering results from LPA can still be trusted when there is random error affecting one variable. K-means and Ward’s method, on the other hand, appear to already’break down’ when random error affects one variable and lead to inaccurate classifications. When the error is systematic and/or it affects more variables, all clustering methods produce severely biased results.

Keywords

Clustering, k-means, Latent profile analysis (LPA), Measurement error, Ward’s method, Statistics and Probability, General Social Sciences

Citation

Pankowska, P, Oberski, D, Garnier-Villarreal, M & Pavlopoulos, D 2025, 'The effect of measurement error on clustering', Quality and Quantity, vol. 59, no. 5, pp. 4825-4860. https://doi.org/10.1007/s11135-025-02177-9