Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

Leeuwenberg, Artuur M; van Smeden, Maarten; Langendijk, Johannes A; van der Schaaf, Arjen; Mauer, Murielle E; Moons, Karel G M; Reitsma, Johannes B; Schuit, Ewoud

doi:https://doi.org/10.1186/s41512-021-00115-5

Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

Files

s41512_021_00115_5.pdf (2.29 MB)

Publication date

2022-01-11

Authors

Leeuwenberg, A M

van Smeden, Maarten

Langendijk, Johannes A

van der Schaaf, Arjen

Mauer, Murielle E

Moons, Carl

Reitsma, Johannes B.

Schuit, Ewoud

DOI

https://doi.org/10.1186/s41512-021-00115-5

Document Type

Article

Metadata

Show full item record

Collections

UMC Repository

License

cc_by

Abstract

BACKGROUND: Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate. METHODS: We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations. RESULTS: In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R2, Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout. CONCLUSIONS: Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors.

Keywords

Journal Article

Citation

Leeuwenberg, A M, van Smeden, M, Langendijk, J A, van der Schaaf, A, Mauer, M E, Moons, K G M, Reitsma, J B & Schuit, E 2022, 'Performance of binary prediction models in high-correlation low-dimensional settings : a comparison of methods', Diagnostic and Prognostic Research, vol. 6, 1. https://doi.org/10.1186/s41512-021-00115-5

URI

https://dspace.library.uu.nl/handle/1874/446309

Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI