Automatic Prediction of Recurrence of Major Cardiovascular Events: A Text Mining Study Using Chest X-Ray Reports

Publication date

2021-07-09

Authors

Bagheri, Ayoub
Groenhof, T. Katrien J.
Asselbergs, Folkert WORCID 0000-0002-1692-8669ISNI 0000000391548591
Haitjema, SaskiaORCID 0000-0001-5465-4868
Bots, Michiel LORCID 0000-0003-2871-9810ISNI 0000000391893395
Veldhuis, WBORCID 0000-0002-9798-6843ISNI 0000000395578034
de Jong, Pim AORCID 0000-0003-4840-6854ISNI 0000000395539334
Oberski, DanielORCID 0000-0001-7467-2297

Editors

Advisors

Supervisors

Document Type

Article

Collections

Open Access logo

License

cc_by

Abstract

Background and Objective. Electronic health records (EHRs) contain free-text information on symptoms, diagnosis, treatment, and prognosis of diseases. However, this potential goldmine of health information cannot be easily accessed and used unless proper text mining techniques are applied. The aim of this project was to develop and evaluate a text mining pipeline in a multimodal learning architecture to demonstrate the value of medical text classification in chest radiograph reports for cardiovascular risk prediction. We sought to assess the integration of various text representation approaches and clinical structured data with state-of-the-art deep learning methods in the process of medical text mining. Methods. We used EHR data of patients included in the Second Manifestations of ARTerial disease (SMART) study. We propose a deep learning-based multimodal architecture for our text mining pipeline that integrates neural text representation with preprocessed clinical predictors for the prediction of recurrence of major cardiovascular events in cardiovascular patients. Text preprocessing, including cleaning and stemming, was first applied to filter out the unwanted texts from X-ray radiology reports. Thereafter, text representation methods were used to numerically represent unstructured radiology reports with vectors. Subsequently, these text representation methods were added to prediction models to assess their clinical relevance. In this step, we applied logistic regression, support vector machine (SVM), multilayer perceptron neural network, convolutional neural network, long short-term memory (LSTM), and bidirectional LSTM deep neural network (BiLSTM). Results. We performed various experiments to evaluate the added value of the text in the prediction of major cardiovascular events. The two main scenarios were the integration of radiology reports (1) with classical clinical predictors and (2) with only age and sex in the case of unavailable clinical predictors. In total, data of 5603 patients were used with 5-fold cross-validation to train the models. In the first scenario, the multimodal BiLSTM (MI-BiLSTM) model achieved an area under the curve (AUC) of 84.7%, misclassification rate of 14.3%, and F1 score of 83.8%. In this scenario, the SVM model, trained on clinical variables and bag-of-words representation, achieved the lowest misclassification rate of 12.2%. In the case of unavailable clinical predictors, the MI-BiLSTM model trained on radiology reports and demographic (age and sex) variables reached an AUC, F1 score, and misclassification rate of 74.5%, 70.8%, and 20.4%, respectively. Conclusions. Using the case study of routine care chest X-ray radiology reports, we demonstrated the clinical relevance of integrating text features and classical predictors in our text mining pipeline for cardiovascular risk prediction. The MI-BiLSTM model with word embedding representation appeared to have a desirable performance when trained on text data integrated with the clinical variables from the SMART study. Our results mined from chest X-ray reports showed that models using text data in addition to laboratory values outperform those using only known clinical predictors.

Keywords

Health Informatics, Biotechnology, Surgery, Biomedical Engineering, Journal Article

Citation

Bagheri, A, Groenhof, T K J, Asselbergs, F W, Haitjema, S, Bots, M L, Veldhuis, W B, de Jong, P A & Oberski, D L 2021, 'Automatic Prediction of Recurrence of Major Cardiovascular Events : A Text Mining Study Using Chest X-Ray Reports', Journal of healthcare engineering, vol. 2021, 6663884. https://doi.org/10.1155/2021/6663884