Using hidden Markov models to assess and correct for measurement error in digital trace data

Publication date

2026

Authors

Pankowska, Paulina K.ORCID 0000-0001-6226-6814ISNI 000000049291217X
Cernat, Alexandru
Keusch, Florian
Bach, Ruben

Editors

Advisors

Supervisors

Document Type

Article

License

taverne

Abstract

Digital trace data are increasingly used across the social and behavioral sciences. They allow researchers to access large volumes of highly detailed and continuous information. Such scale and speed cannot be achieved when using traditional sources, such as surveys. Digital traces are also believed to overcome some of the limitations that surveys are criticized for. However, while their use undoubtedly presents researchers with new possibilities, it also introduces new quality challenges that have been increasingly acknowledged. Accounting for these limitations is crucial, as they can lead to biased results and incorrect research findings. Therefore, in this paper, we apply hidden Markov models (HMMs) to digital trace data on Facebook use to assess the nature and incidence of error in measures of Facebook use frequency. HMMs are an attractive method that allows for the estimation and correction of error without the availability of (error-free) gold-standard data, if the assumptions regarding the underlying construct of interest and the nature of the error are met. Our results suggest that the measures derived from digital trace data severely underestimate the frequency of Facebook use for a third of our sample, in particular when not all relevant devices are tracked.

Keywords

Taverne, Communication

Citation

Pankowska, P, Cernat, A, Keusch, F & Bach, R 2026, 'Using hidden Markov models to assess and correct for measurement error in digital trace data', Communication Methods and Measures, vol. 20, no. 1, pp. 78-102. https://doi.org/10.1080/19312458.2025.2573265