Multiple imputation for multilevel data with continuous and binary variables

Publication date

2018-05-01

Authors

Audigier, Vincent
White, Ian R.
Jolani, ShahabISNI 0000000397105775
Debray, Thomas P.A.
Quartagno, Matteo
Carpenter, James
Buuren, Stef vanORCID 0000-0003-1098-2119ISNI 0000000032712898
Resche-Rigon, Matthieu

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

Abstract

We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.

Keywords

Fully conditional specification, Joint modelling, Missing data, Mixed data, Multilevel data, Multiple imputation, Systematically missing values, Statistics and Probability, General Mathematics, Statistics, Probability and Uncertainty

Citation

Audigier, V, White, I R, Jolani, S, Debray, T P A, Quartagno, M, Carpenter, J, van Buuren, S & Resche-Rigon, M 2018, 'Multiple imputation for multilevel data with continuous and binary variables', Statistical Science, vol. 33, no. 2, pp. 160-183. https://doi.org/10.1214/18-STS646