StackPlagger: A System for Identifying AI-Code Plagiarism on Stack Overflow
Identifying AI code plagiarism on technical forums like Stack Overflow (SO) is critical, as it can directly impact the platform’s trust and credibility. While previous studies have explored AI-generated code detection, they have focused on long, standalone samples from repositories and competitions. In contrast, SO snippets are often short, fragmented, and context-specific, which can make detection more challenging. Furthermore, existing methods have also not adequately addressed the concern of obfuscated or adversarially prompted code that are crafted to mimic human style and evade detection. To address these gaps, we first introduce a curated dataset of 8000 SO-ChatGPT snippet pairs generated using multiple adversarial prompts. While earlier methods solely relied on pre-trained models, we propose an ensemble approach combining stylometric features of code along with the pre-trained embeddings to improve detection performance. Finally, we deploy our fine-tuned model as a Google Chrome extension called `StackPlagger’, which can flag AI-generated code in SO answers and display AI confidence scores. Video demonstration and the associated artifacts of our tool can be found at \url{https://youtu.be/6O9Urp2mvbI} and \url{https://github.com/harsh-g1/StackPlagger}, respectively.