A quantitative comparison of program plagiarism detection tools

Publication date

2017-11-14

Authors

Heres, Daniël
Hage, JurriaanISNI 0000000356203424

Editors

Pieterse, Vreda
van Eekelen, Marko
Giannakos, Michalis

Advisors

Supervisors

Document Type

Part of book
Open Access logo

License

taverne

Abstract

In this work we compare a total of 9 different tools for the detection of source code plagiarism. We evaluated the plagiarism or copy detection tools CPD, JPlag, Sherlock, Marble, Moss, Plaggie and SIM and two baselines, one based on the Unix tool diff and one based on the difflib module from the Python Standard Library. We provide visualizations of the output of these tools and compare the performance of each tool when running it on different tasks by comparing both the F-measures and the area under the precision-recall-curve (AUC-PR). We compare the performance using these metrics on each task and identify the best performing tools.

Keywords

Empirical study, Program plagiarism detection, Quantitative comparison, Tools, Taverne, Software, Human-Computer Interaction, Computer Vision and Pattern Recognition, Computer Networks and Communications

Citation

Heres, D & Hage, J 2017, A quantitative comparison of program plagiarism detection tools. in V Pieterse, M van Eekelen & M Giannakos (eds), CSERC '17: Proceedings of the 6th Computer Science Education Research Conference. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 73-82, 6th Computer Science Education Research Conference, CSERC 2017, Helsinki, Finland, 13/11/17. https://doi.org/10.1145/3162087.3162101, conference