Classification in a Skewed Online Trade Fraud Complaint Corpus

Publication date

2017-11

Authors

Kos, William
Schraagen, MarijnISNI 0000000419454950
Brinkhuis, MatthieuORCID 0000-0003-1054-6683ISNI 0000000419480083
Bex, FlorisORCID 0000-0002-5699-9656ISNI 0000000118066508

Editors

Verheij, Bart
Wiering, Marco

Advisors

Supervisors

DOI

Document Type

Part of book
Open Access logo

License

Abstract

This paper explores how machine learning techniques can be used to support handling of skewed online trade fraud complaints, by predicting whether a complaint will be withdrawn or not. To optimize the performance of each classifier, the influence of resampling, word weighting, and word normalization on the classification performance is assessed. It is found that machine learning can indeed be used for this purpose, by improving the baseline performance in comparison to the skewness ratio up to 13 pp using Logistic Regression. Furthermore, the results show that data alteration techniques can improve classifier performance on a skewed dataset up to 13.5 pp.

Keywords

Classification, Law Enforcement, Skewed Data

Citation

Kos, W, Schraagen, M P, Brinkhuis, M J S & Bex, F J 2017, Classification in a Skewed Online Trade Fraud Complaint Corpus. in B Verheij & M Wiering (eds), Preproceedings of the 29th Benelux Conference on Artificial Intelligence November 8–9, 2017 in Groningen, The Netherlands : BNAIC 2017. pp. 172-183, The 29th Benelux Conference on Artificial Intelligence, Groningen, Netherlands, 8/11/17., conference