CodeMorph: Mitigating Data Leakage in Large Language Model Assessment (ICSE 2025 - Industry Challenge Track)

Who

Hongzhou Rao, Yanjie Zhao, Wenjie Zhu, Ling Xiao, Meizhen Wang, Haoyu Wang

Track

ICSE 2025 Industry Challenge Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 1 May 2025 14:30 - 14:45 at 211 - Industry Challenge Presentations Chair(s): Federica Sarro, Xin Xia

Abstract

Concerns about benchmark leakage in large language models for code (Code LLMs) have raised issues of data contamination and inflated evaluation metrics. The diversity and inaccessibility of many training datasets make it difficult to prevent data leakage entirely, even with time lag strategies. Consequently, generating new datasets through code perturbation has become essential. However, existing methods often fail to produce complex and diverse variations, struggle with complex cross-file dependencies, and lack support for multiple programming languages, which limits their effectiveness in enhancing LLM evaluations for coding tasks. To fill this gap, we propose CodeMorph, an approach designed to support multiple programming languages while preserving cross-file dependencies to mitigate data leakage. CodeMorph consists of two main components that work together to enhance the perturbation process. The first component employs 26 semantic-preserving transformation methods to iteratively perturb code, generating diverse variations while ensuring that the modified code remains compilable. The second component introduces a genetic algorithm-based selection algorithm, PESO, to identify the more effective perturbation method for each iteration by targeting lower similarity scores between the perturbed and original code, thereby enhancing overall perturbation effectiveness. Experimental results demonstrate that after applying CodeMorph, the accuracy of the LLM on code completion tasks across five programming languages decreased by an average of 24.67%, with Python showing the most significant reduction at 45%. The similarity score of code optimized by PESO is, on average, 7.01% lower than that of randomly perturbed code, peaking at a reduction of 42.86%. Additionally, overall accuracy dropped by an average of 15%, with a maximum decrease of 25%. These findings indicate that CodeMorph effectively reduces data contamination while PESO optimizes perturbation combinations for code.

Hongzhou Rao

Huazhong University of Science and Technology

Yanjie Zhao

Huazhong University of Science and Technology

China

Wenjie Zhu

Huazhong University of Science and Technology

Ling Xiao

Huazhong University of Science and Technology

Meizhen Wang

Huazhong University of Science and Technology

Haoyu Wang

Huazhong University of Science and Technology

China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Industry Challenge PresentationsIndustry Challenge Track at 211 Chair(s): Federica Sarro University College London, Xin Xia Huawei

14:00 15m Talk		CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge GraphAward Winner Industry Challenge Track Hanxiang Xu Huazhong University of Science and Technology, Wei Ma , Ting Zhou Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Kai Chen Huazhong University of Science and Technology, Qiang Hu The University of Tokyo, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology
14:15 15m Talk		ClauseBench: Enhancing Software License Analysis with Clause-Level Benchmarking Industry Challenge Track Qiang Ke Huazhong University of Science and Technology, Xinyi Hou Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology
14:30 15m Talk		CodeMorph: Mitigating Data Leakage in Large Language Model Assessment Industry Challenge Track Hongzhou Rao Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Wenjie Zhu Huazhong University of Science and Technology, Ling Xiao Huazhong University of Science and Technology, Meizhen Wang Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology
14:45 15m Talk		CommitShield: Tracking Vulnerability Introduction and Fix in Version Control SystemsSecurity Industry Challenge Track Zhaonan Wu Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Chen Wei MYbank, Ant Group, Zirui Wan Huazhong University of Science and Technology, Yue Liu Monash University, Haoyu Wang Huazhong University of Science and Technology
15:00 15m Talk		Exploring Large Language Models for Analyzing Open Source License Conflicts: How Far Are We? Industry Challenge Track Xing Cui Institute of Software, Chinese Academy of Sciences, Jingzheng Wu Institute of Software, The Chinese Academy of Sciences, Xiang Ling Institute of Software, Chinese Academy of Sciences, Tianyue Luo Institute of Software, Chinese Academy of Sciences, Mutian Yang Beijing ZhongKeWeiLan Technology Co.,Ltd., Wenxiang Ou Institute of Software, Chinese Academy of Sciences
15:15 15m Talk		OSS-LCAF: Open-Source Software License Conflict Analysis Framework Industry Challenge Track Aditya Kahol TCS Research, Anka Chandrahas Tummepalli TCS Research, Preethu Rose Anish TCS Research