Deep learning for automatic calcium scoring in CT: Validation using multiple cardiac CT and chest CT protocols

Publication date

2020-04

Authors

van Velzen, S. G.
Lessmann, Nikolas
Velthuis, BirgittaORCID 0000-0002-2542-9474ISNI 0000000395231874
Bank, Ingrid Em
Van den Bongard, DesireeISNI 0000000388003670
Leiner, TimORCID 0000-0003-1885-5499ISNI 0000000390698205
de Jong, Pim AORCID 0000-0003-4840-6854ISNI 0000000395539334
Veldhuis, WBORCID 0000-0002-9798-6843ISNI 0000000395578034
Correa, Adolfo
Terry, James G.

Editors

Advisors

Supervisors

Document Type

Article

Collections

Open Access logo

License

taverne

Abstract

Background Although several deep learning (DL) calcium scoring methods have achieved excellent performance for specific CT protocols, their performance in a range of CT examination types is unknown. Purpose To evaluate the performance of a DL method for automatic calcium scoring across a wide range of CT examination types and to investigate whether the method can adapt to different types of CT examinations when representative images are added to the existing training data set. Materials and Methods The study included 7240 participants who underwent various types of nonenhanced CT examinations that included the heart: coronary artery calcium (CAC) scoring CT, diagnostic CT of the chest, PET attenuation correction CT, radiation therapy treatment planning CT, CAC screening CT, and low-dose CT of the chest. CAC and thoracic aorta calcification (TAC) were quantified using a convolutional neural network trained with (a) 1181 low-dose chest CT examinations (baseline), (b) a small set of examinations of the respective type supplemented to the baseline (data specific), and (c) a combination of examinations of all available types (combined). Supplemental training sets contained 199-568 CT images depending on the calcium burden of each population. The DL algorithm performance was evaluated with intraclass correlation coefficients (ICCs) between DL and manual (Agatston) CAC and (volume) TAC scoring and with linearly weighted κ values for cardiovascular risk categories (Agatston score; cardiovascular disease risk categories: 0, 1-10, 11-100, 101-400, >400). Results At baseline, the DL algorithm yielded ICCs of 0.79-0.97 for CAC and 0.66-0.98 for TAC across the range of different types of CT examinations. ICCs improved to 0.84-0.99 (CAC) and 0.92-0.99 (TAC) for CT protocol-specific training and to 0.85-0.99 (CAC) and 0.96-0.99 (TAC) for combined training. For assignment of cardiovascular disease risk category, the κ value for all test CT scans was 0.90 (95% confidence interval [CI]: 0.89, 0.91) for the baseline training. It increased to 0.92 (95% CI: 0.91, 0.93) for both data-specific and combined training. Conclusion A deep learning calcium scoring algorithm for quantification of coronary and thoracic calcium was robust, despite substantial differences in CT protocol and variations in subject population. Augmenting the algorithm training with CT protocol-specific images further improved algorithm performance. © RSNA, 2020 See also the editorial by Vannier in this issue.

Keywords

Aged, Clinical Protocols, Coronary Artery Disease/diagnostic imaging, Deep Learning, Female, Heart/diagnostic imaging, Humans, Male, Middle Aged, Retrospective Studies, Thorax/diagnostic imaging, Tomography, X-Ray Computed/methods, Vascular Calcification/diagnostic imaging, Taverne, Radiology Nuclear Medicine and imaging, Research Support, Non-U.S. Gov't, Journal Article, Research Support, N.I.H., Extramural, Validation Studies

Citation

van Velzen, S G M, Lessmann, N, Velthuis, B K, Bank, I E M, van den Bongard, D H J G, Leiner, T, de Jong, P A, Veldhuis, W B, Correa, A, Terry, J G, Carr, J J, Viergever, M A, Verkooijen, H M & Išgum, I 2020, 'Deep learning for automatic calcium scoring in CT : Validation using multiple cardiac CT and chest CT protocols', Radiology, vol. 295, no. 1, pp. 66-79. https://doi.org/10.1148/radiol.2020191621