Dealing with missing data using the Heckman selection model: methods primer for epidemiologists

Publication date

2023-02-01

Authors

Munoz Avila, Johanna
Hufstedler, Heather
Gustafson, Paul
Bärnighausen, Till
de Jong, ValentijnORCID 0000-0001-9921-3468
Debray, ThomasORCID 0000-0002-1790-2719ISNI 0000000390283878

Editors

Advisors

Supervisors

Document Type

Article

Collections

Open Access logo

License

taverne

Abstract

Missing data is a common problem in epidemiologic studies and is often addressed by omitting incomplete records or adopting multiple imputation. Although these methods can produce unbiased estimates of study associations, their validity becomes problematic when data are missing not at random (MNAR), and the missing data mechanism is nonignorable. This situation typically arises when the presence of missing values depends on characteristics of the measurement or recording process, which is common in surveys and databases with electronic healthcare records. In this article, we discuss the relevance and implementation of Heckman selection models to impute variables that are missing not at random.

Keywords

Heckman selection model, exclusion restriction variables, selection bias, missing data, causal inference, real world data, Taverne, Epidemiology

Citation

Muñoz, J, Hufstedler, H, Gustafson, P, Bärnighausen, T, De Jong, V M T & Debray, T P A 2023, 'Dealing with missing data using the Heckman selection model : methods primer for epidemiologists', International Journal of Epidemiology, vol. 52, no. 1, pp. 5-13. https://doi.org/10.1093/ije/dyac237