A robust unsupervised method for outlier set detection
Publication date
2025-11-04
Editors
Advisors
Supervisors
Document Type
Article
Metadata
Show full item recordCollections
License
taverne
Abstract
This paper proposes a robust method that identifies sets of points that collectively deviate from typical patterns in a dataset, which it calls “outlier sets”, while excluding individual points from detection. This new methodology, Outlier Set Two-step Identification (OSTI) employs a two-step approach to detect and label these outlier sets. First, it uses Gaussian Mixture Models for probabilistic clustering, identifying candidate outlier sets based on cluster weights below a hyperparameter threshold. Second, OSTI measures the Inter-cluster Mahalanobis distance between each candidate outlier set's centroid and the overall dataset mean. OSTI then tests the null hypothesis that this distance does not significantly differ from its theoretical chi-square distribution, enabling the formal detection of outlier sets. We test OSTI systematically on 8000 synthetic 2D datasets across various inlier configurations and thousands of possible outlier set characteristics. Results show OSTI robustly and consistently detects outlier sets with an average F1 score of 0.92 and an average purity (the degree to which outlier sets identified correspond to those generated synthetically, i.e., our ground truth) of 98.58 %. We also compare OSTI with state-of-the-art outlier detection methods, to illuminate how OSTI fills a gap as a tool for the exclusive detection of outlier sets.
Keywords
Gaussian mixture models, Inter-cluster Mahalanobis distance, Outlier set two-step identification (OSTI), Outlier sets, Taverne, Management Information Systems, Software, Information Systems and Management, Artificial Intelligence
Citation
Sarfraz, A, Birnbaum, A, Dolan, F, Lamontagne, J, Mihaylova, L & Rougé, C 2025, 'A robust unsupervised method for outlier set detection', Knowledge-Based Systems, vol. 329, 114274. https://doi.org/10.1016/j.knosys.2025.114274