Detecting Automatic Software Plagiarism via Token Sequence Normalization
While software plagiarism detectors have been used for decades, the assumption that evading detection requires programming proficiency is challenged by the emergence of automated plagiarism generators. These generators enable effortless obfuscation attacks, exploiting vulnerabilities in existing detectors by inserting statements to disrupt the matching of related programs. Thus, we present a novel, language-independent defense mechanism that leverages program dependence graphs, thus rendering such attacks infeasible. We evaluate our approach with multiple real-world datasets and show that it defeats plagiarism generators by offering broad resilience against automated obfuscation while maintaining a low rate of false positives. We thus provide a practical and efficient solution for state-of-the-art software plagiarism detectors.
More on JPlag: