Exploring Embedding Spaces for more Coherent Topic Modeling in Electronic Health Records
Publication date
2022
Editors
Advisors
Supervisors
Document Type
Part of book
Metadata
Show full item recordCollections
License
taverne
Abstract
The written notes in the Electronic Health Records contain a vast amount of information about patients. Implementing automated approaches for text classification tasks requires the automated methods to be well-interpretable, and topic models can be used for this goal as they can indicate what topics in a text are relevant to making a decision. We propose a new topic modeling algorithm, FLSA-E, and compare it with another state-of-the-art algorithm FLSA-W. In FLSA-E, topics are found by fuzzy clustering in a word embedding space. Since we use word embeddings as the basis for our clustering, we extend our evaluation with word-embeddings-based evaluation metrics. We find that different evaluation metrics favour different algorithms. Based on the results, there is evidence that FLSA-E has fewer outliers in its topics, a desirable property, given that within-topic words need to be semantically related.
Keywords
Electronic Health Records, Fuzzy Clustering, Fuzzy Methods, Natural Language Processing, Neural Network methods, Psychiatry, Topic Modeling, Word Embeddings, Taverne, Electrical and Electronic Engineering, Control and Systems Engineering, Human-Computer Interaction, SDG 3 - Good Health and Well-being
Citation
Rijcken, E, Zervanou, K, Spruit, M, Mosteiro Romero, P, Scheepers, F E & Kaymak, U 2022, Exploring Embedding Spaces for more Coherent Topic Modeling in Electronic Health Records. in IEEE International Conference on Systems, Man, and Cybernetics. Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, vol. 2022-October, IEEE, pp. 2669-2674, 2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022, Prague, Czech Republic, 9/10/22. https://doi.org/10.1109/SMC53654.2022.9945594, conference