Detecting Automatic Software Plagiarism via Token Sequence Normalization (ICSE 2024 - Artifact Evaluation)

Who

Timur Sağlam, Moritz Brödel, Larissa Schmid, Sebastian Hahner

Track

ICSE 2024 Artifact Evaluation

Abstract

While software plagiarism detectors have been used for decades, the assumption that evading detection requires programming proficiency is challenged by the emergence of automated plagiarism generators. These generators enable effortless obfuscation attacks, exploiting vulnerabilities in existing detectors by inserting statements to disrupt the matching of related programs. Thus, we present a novel, language-independent defense mechanism that leverages program dependence graphs, thus rendering such attacks infeasible. We evaluate our approach with multiple real-world datasets and show that it defeats plagiarism generators by offering broad resilience against automated obfuscation while maintaining a low rate of false positives. We thus provide a practical and efficient solution for state-of-the-art software plagiarism detectors.

More on JPlag:

Link to Preprint

https://publikationen.bibliothek.kit.edu/1000167588/152076500

DOI

https://doi.org/10.1145/3597503.3639192

Detecting Automatic Software Plagiarism via Token Sequence Normalization

Timur Sağlam

Karlsruhe Institute of Technology (KIT)

Germany

Moritz Brödel

Karlsruhe Institute of Technology (KIT)

Germany

Larissa Schmid

Karlsruhe Institute of Technology

Sebastian Hahner

Karlsruhe Institute of Technology (KIT)

Germany

Tracks

Co-hosted Conferences

Workshops