ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil
Thu 16 Apr 2026 12:15 - 12:30 at Asia IV - AI for Software Engineering 11 Chair(s): Timothy Lethbridge

The success of large language models for code relies on vast amounts of code data, including public open-source repositories, such as GitHub, and private, confidential code from companies. This raises concerns about intellectual property compliance and the potential unauthorized use of license-restricted code. While membership inference (MI) techniques have been proposed to detect such unauthorized usage, their effectiveness can be undermined by semantically equivalent code transformation techniques, which modify code syntax while preserving semantic.

In this work, we systematically investigate whether semantically equivalent code transformation rules might be leveraged to evade MI detection. The results reveal that model accuracy drops by only 1.5% in the worst case for each rule, demonstrating that transformed datasets can effectively serve as substitutes for fine-tuning. Additionally, we find that one of the rules (RenameVariable) reduces MI success by 10.19%, highlighting its potential to obscure the presence of restricted code. To validate these findings, we conduct a causal analysis confirming that variable renaming has the strongest causal effect in disrupting MI detection. Notably, we find that combining multiple transformations does not further reduce MI effectiveness. Our results expose a critical loophole in license compliance enforcement for training large language models for code, showing that MI detection can be substantially weakened by transformation-based obfuscation techniques.

Thu 16 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30
AI for Software Engineering 11Research Track / SE In Practice (SEIP) at Asia IV
Chair(s): Timothy Lethbridge University of Ottawa
11:00
15m
Talk
LLM-based Agents for Automated Bug Fixing: How Far Are We?
Research Track
Xiangxin Meng Bytedance, Zexiong Ma Peking University, Pengfei Gao ByteDance, Chao Peng ByteDance
11:15
15m
Talk
Depradar: Agentic Coordination for Context-Aware Defect Impact Analysis in Deep Learning LibrariesVirtual Attendance
Research Track
Yi Gao Zhejiang University, Xing Hu Zhejiang University, Tongtong Xu Huawei, Jiali Zhao Huawei, Xiaohu Yang Zhejiang University, Xin Xia Zhejiang University
Pre-print Media Attached File Attached
11:30
15m
Talk
Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair
SE In Practice (SEIP)
José Pablo Cambronero Google, USA, Michele Tufano Google, Sherry Shi Google, Renyao Wei Google, Grant Uy Google, Sam Cheng Google, Chin-Jung Liu Google, Shiying Pan Google, Satish Chandra Meta Platforms, Inc., Patrick Rondon Google
11:45
15m
Talk
OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies
SE In Practice (SEIP)
12:00
15m
Talk
Intelligent Triage: Interpretable Incident Triage Workflow using LLM Extracted Triage ReasoningVirtual Attendance
SE In Practice (SEIP)
Jianing Liu Fudan University, Hao Ren University of Illinois Urbana-Champaign, Yu Kang Microsoft, Minghua Ma Microsoft, Fangkai Yang Microsoft Research, Yong Xu Microsoft Research, Xin Gao Microsoft 365, Meng Zhang , Hongbin Wang Microsoft, Xuedong Gao Microsoft, Qingwei Lin Microsoft, Yingnong Dang Microsoft Azure, Saravan Rajmohan Microsoft, Dongmei Zhang Microsoft, Qi Zhang Microsoft, Chetan Bansal Microsoft Research, Yangfan Zhou Fudan University
Media Attached
12:15
15m
Talk
How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?
Research Track
Hua yang North Carolina State University, Alejandro Velasco William & Mary, Thanh Le-Cong Singapore University of Technology and Design, Singapore, Md Nazmul Haque North Carolina State University, Bowen Xu North Carolina State University, Denys Poshyvanyk William & Mary