Instability of the AUROC of Clinical Prediction Models

van Leeuwen, Florian D.; Steyerberg, Ewout W.; van Klaveren, David; Wessler, Ben; Kent, David M.; van Zwet, Erik W.

doi:https://doi.org/10.1002/sim.70011

Instability of the AUROC of Clinical Prediction Models

Files

Statistics_in_Medicine_-_2025_-_van_Leeuwen_-_Instabili... (596.31 KB)

Publication date

2025-02-28

Authors

van Leeuwen, Florian D.

Steyerberg, Ewout W.

van Klaveren, David

Wessler, Ben

Kent, David M.

van Zwet, Erik W.

DOI

https://doi.org/10.1002/sim.70011

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by_nc_nd

Abstract

Background: External validations are essential to assess the performance of a clinical prediction model (CPM) before deployment. Apart from model misspecification, also differences in patient population, the standard of care, predictor definitions, and other factors influence a model's discriminative ability, as commonly quantified by the AUC (or c-statistic). We aimed to quantify the variation in AUCs across sets of external validation studies and propose ways to adjust expectations of a model's performance in a new setting. Methods: The Tufts-PACE CPM Registry holds a collection of CPMs for prognosis in cardiovascular disease. We analyzed the AUC estimates of 469 CPMs with at least one external validation. Combined, these CPMs had a total of 1603 external validations reported in the literature. For each CPM and its associated set of validation studies, we performed a random-effects meta-analysis to estimate the between-study standard deviation (Formula presented.) among the AUCs. Since the majority of these meta-analyses have only a handful of validations, this leads to very poor estimates of (Formula presented.). So, instead of focusing on a single CPM, we estimated a log-normal distribution of (Formula presented.) across all 469 CPMs. We then used this distribution as an empirical prior. We used cross-validation to compare this empirical Bayesian approach with frequentist fixed and random-effects meta-analyses. Results: The 469 CPMs included in our study had a median of 2 external validations with an IQR of [1–3]. The estimated distribution of (Formula presented.) had a mean of 0.055 and a standard deviation of 0.015. If (Formula presented.) = 0.05, then the 95% prediction interval for the AUC in a new setting has a width of at least (Formula presented.) 0.1, no matter how many validations have been done. When there are fewer than 5 validations, which is typically the case, the usual frequentist methods grossly underestimate the uncertainty about the AUC in a new setting. Accounting for (Formula presented.) in a Bayesian approach achieved near nominal coverage. Conclusion: Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.

Keywords

CPM, clinical prediction models, empirical Bayes, heterogeneity, meta-analysis, Epidemiology, Statistics and Probability, SDG 3 - Good Health and Well-being

Citation

van Leeuwen, F D, Steyerberg, E W, van Klaveren, D, Wessler, B, Kent, D M & van Zwet, E W 2025, 'Instability of the AUROC of Clinical Prediction Models', Statistics in Medicine, vol. 44, no. 5, e70011. https://doi.org/10.1002/sim.70011

URI

https://dspace.library.uu.nl/handle/1874/479741

Instability of the AUROC of Clinical Prediction Models

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI