DetCat: Detecting Categorical Outliers in Relational Datasets

Zylinski, Arthur; Qahtan, Abdulhakim A.

doi:https://doi.org/10.1145/3627673.3679212

DetCat: Detecting Categorical Outliers in Relational Datasets

Files

3627673.3679212.pdf (1.2 MB)

Publication date

2024-10-21

Authors

Zylinski, Arthur

Qahtan, A.A.A.

DOI

https://doi.org/10.1145/3627673.3679212

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

taverne

Abstract

Poor data quality significantly affects different data analytics tasks, leading to inaccurate decisions and poor predictions of the machine learning models. Outliers represent one of the most common data glitches that impact data quality. While detecting outliers in numerical data has been extensively studied, few attempts were made to solve the problem of detecting categorical outliers. In this paper, we introduce DetCat for detecting categorical outliers in relational datasets, by utilizing the syntactic structure of the values. For a given attribute, DetCat identifies a set of patterns that represents the majority of the values as dominating patterns. Data values that cannot be generated by the dominating patterns are declared as outliers. The demo will show the effectiveness of our tool in detecting categorical outliers and discovering the syntactical data patterns.

Keywords

categorical values, outliers, similarity metrics, syntactic structure, Taverne, General Business,Management and Accounting, General Decision Sciences

Citation

Zylinski, A & Qahtan, A A 2024, DetCat : Detecting Categorical Outliers in Relational Datasets. in CIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. International Conference on Information and Knowledge Management, Proceedings, Association for Computing Machinery, pp. 5318-5322, 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024, Boise, United States, 21/10/24. https://doi.org/10.1145/3627673.3679212, conference

URI

https://dspace.library.uu.nl/handle/1874/482520

DetCat: Detecting Categorical Outliers in Relational Datasets

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI