ICSME 2025
Sun 7 - Fri 12 September 2025 Auckland, New Zealand

AI-generated code has become an integral part of the mainstream developer workflow today. However, in community-driven platforms like Stack Overflow (SO), where trust, authorship, and credibility are important, AI-generated content can raise concerns around AI plagiarism. While some recent studies have focused on detecting AI-generated code, they have mostly worked with large code samples from standard repositories and programming competitions. In contrast, code snippets on SO are often small and context-specific, making it much difficult for detection. Moreover, another aspect overlooked in prior studies concerns recognizing adversarially prompted AI-generated code deliberately crafted to resemble human-written code. To address these limitations, we have first introduced a large-scale dataset comprising 3500 pairs of SO and ChatGPT answers, along with a curated set of 4500 adversarially prompted AI responses. Next, we evaluate existing code language models over this newly curated dataset. Our evaluation shows that existing models perform well on standard AI answers but fail to detect adversarial ones. Finally, to improve detection, we propose an ensemble approach combining stylometric features of code along with the code embeddings. Our approach shows consistent improvements across multiple models and improves generalization to adversarial prompted code. We release our full dataset to facilitate further research in this area.