Instability of the AUROC of Clinical Prediction Models
Publication date
2025-02-28
Editors
Advisors
Supervisors
Document Type
Article
Metadata
Show full item recordCollections
License
cc_by_nc_nd
Abstract
Background: External validations are essential to assess the performance of a clinical prediction model (CPM) before deployment. Apart from model misspecification, also differences in patient population, the standard of care, predictor definitions, and other factors influence a model's discriminative ability, as commonly quantified by the AUC (or c-statistic). We aimed to quantify the variation in AUCs across sets of external validation studies and propose ways to adjust expectations of a model's performance in a new setting. Methods: The Tufts-PACE CPM Registry holds a collection of CPMs for prognosis in cardiovascular disease. We analyzed the AUC estimates of 469 CPMs with at least one external validation. Combined, these CPMs had a total of 1603 external validations reported in the literature. For each CPM and its associated set of validation studies, we performed a random-effects meta-analysis to estimate the between-study standard deviation (Formula presented.) among the AUCs. Since the majority of these meta-analyses have only a handful of validations, this leads to very poor estimates of (Formula presented.). So, instead of focusing on a single CPM, we estimated a log-normal distribution of (Formula presented.) across all 469 CPMs. We then used this distribution as an empirical prior. We used cross-validation to compare this empirical Bayesian approach with frequentist fixed and random-effects meta-analyses. Results: The 469 CPMs included in our study had a median of 2 external validations with an IQR of [1–3]. The estimated distribution of (Formula presented.) had a mean of 0.055 and a standard deviation of 0.015. If (Formula presented.) = 0.05, then the 95% prediction interval for the AUC in a new setting has a width of at least (Formula presented.) 0.1, no matter how many validations have been done. When there are fewer than 5 validations, which is typically the case, the usual frequentist methods grossly underestimate the uncertainty about the AUC in a new setting. Accounting for (Formula presented.) in a Bayesian approach achieved near nominal coverage. Conclusion: Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.
Keywords
CPM, clinical prediction models, empirical Bayes, heterogeneity, meta-analysis, Epidemiology, Statistics and Probability, SDG 3 - Good Health and Well-being
Citation
van Leeuwen, F D, Steyerberg, E W, van Klaveren, D, Wessler, B, Kent, D M & van Zwet, E W 2025, 'Instability of the AUROC of Clinical Prediction Models', Statistics in Medicine, vol. 44, no. 5, e70011. https://doi.org/10.1002/sim.70011