Multiple imputation of discrete and continuous data by fully conditional specification
Publication date
2010-06-08T09:34:21Z
Authors
Buuren, S. van
Editors
Advisors
Supervisors
DOI
Document Type
Article
Metadata
Show full item recordCollections
License
Abstract
The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data.
To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty
about this structure, and include any knowledge about the process that generated the missing data. Two
approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification
(FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical
properties are known.JMis theoretically sound, but the joint model may lack flexibility needed to represent
typical data features, potentially leading to bias. FCS is a semi-parametric and flexible alternative that
specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS
provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish.
Simulation work shows that FCS behaves very well in the cases studied. The present paper reviews and
compares the approaches. JM and FCS were applied to pubertal development data of 3801 Dutch girls
that had missing data on menarche (two categories), breast development (five categories) and pubic hair
development (six stages). Imputations for these datawere created under two models: amultivariate normal
model with rounding and a conditionally specified discrete model. The JM approach introduced biases
in the reference curves, whereas FCS did not. The paper concludes that FCS is a useful and easily applied
flexible alternative to JM when no convenient and realistic joint distribution can be specified.