A synthetic data set to benchmark anti-money laundering methods

Publication date

2023-12

Authors

Jensen, Rasmus Ingemann Tuffveson
Ferwerda, JorasORCID 0000-0002-8834-7935ISNI 000000038893837X
Jorgensen, Kristian Sand
Jensen, Erik Rathje
Borg, Martin
Krogh, Morten Persson
Jensen, Jonas Brunholm
Iosifidis, Alexandros

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

Bank transactions are highly confidential. As a result, there are no real public data sets that can be used to investigate and compare anti-money laundering (AML) methods in banks. This severely limits research on important AML problems such as efficiency, effectiveness, class imbalance, concept drift, and interpretability. To address the issue, we present SynthAML: a synthetic data set to benchmark statistical and machine learning methods for AML. The data set builds on real data from Spar Nord, a systemically important Danish bank, and contains 20,000 AML alerts and over 16 million transactions. Experimental results indicate that performance on SynthAML can be transferred to the real world. As use cases, we present and discuss open problems in the AML literature.

Keywords

Citation

Jensen, R I T, Ferwerda, J, Jorgensen, K S, Jensen, E R, Borg, M, Krogh, M P, Jensen, J B & Iosifidis, A 2023, 'A synthetic data set to benchmark anti-money laundering methods', Scientific data, vol. 10, no. 1, 661, pp. 1-10. https://doi.org/10.1038/s41597-023-02569-2