Classification in a Skewed Online Trade Fraud Complaint Corpus
Publication date
2017-11
Editors
Verheij, Bart
Wiering, Marco
Advisors
Supervisors
DOI
Document Type
Part of book
Metadata
Show full item recordCollections
License
Abstract
This paper explores how machine learning techniques can be used to support handling of skewed online trade fraud complaints, by predicting whether a complaint will be withdrawn or not. To optimize the performance of each classifier, the influence of resampling, word weighting, and word normalization on the classification performance is assessed. It is found that machine learning can indeed be used for this purpose, by improving the baseline performance in comparison to the skewness ratio up to 13 pp using Logistic Regression. Furthermore, the results show that data alteration techniques can improve classifier performance on a skewed dataset up to 13.5 pp.
Keywords
Classification, Law Enforcement, Skewed Data
Citation
Kos, W, Schraagen, M P, Brinkhuis, M J S & Bex, F J 2017, Classification in a Skewed Online Trade Fraud Complaint Corpus. in B Verheij & M Wiering (eds), Preproceedings of the 29th Benelux Conference on Artificial Intelligence November 8–9, 2017 in Groningen, The Netherlands : BNAIC 2017. pp. 172-183, The 29th Benelux Conference on Artificial Intelligence, Groningen, Netherlands, 8/11/17., conference