Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models
Publication date
2023-02
Editors
Advisors
Supervisors
Document Type
Article
Metadata
Show full item recordCollections
License
cc_by
Abstract
BACKGROUND AND OBJECTIVES: We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques. METHODS: We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes. RESULTS: We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]). CONCLUSION: Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models. SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.
Keywords
Development, Diagnosis, Predictive algorithm, Prognosis, Risk prediction, Validation, Epidemiology, Review, Journal Article
Citation
Andaur Navarro, C L, Damen, J A, van Smeden, M, Takada, T, Nijman, S W, Dhiman, P, Ma, J, Collins, G S, Bajpai, R, Riley, R D, Moons, K G & Hooft, L 2023, 'Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models', Journal of Clinical Epidemiology, vol. 154, pp. 8-22. https://doi.org/10.1016/j.jclinepi.2022.11.015