Two Sides of the Same Coin: Exploiting the Impact of Identifiers in Neural Code Comprehension
Previous studies have demonstrated that neural code comprehension models are vulnerable to identifier to identifier naming. By renaming as few as one identifier in source code, the models would output completely irrelevant results, indicating that identifiers can be misleading for model prediction. However, identifiers are not completely detrimental to code comprehension, since the semantics of identifier names can be related to the program semantics. Well exploiting the two opposite impacts of identifiers is essential for enhancing the robustness and accuracy of neural code comprehension, and still remains under-explored. In this work, we propose to model the impact of identifiers from a novel causal perspective, and propose a counterfactual reasoning-based framework named CREAM. CREAM explicitly captures the misleading information of identifiers through multi-task learning in the training stage, and reduces the misleading impact by counterfactual inference in the inference stage. We evaluate CREAM on three popular neural code comprehension tasks, including function naming, defect detection and code classification. Experiment results show that CREAM not only significantly outperforms baselines in terms of robustness (e.g., +37.9% on the function naming task at F1 score) on the datasets with identifiers renamed, but also achieve relatively better results on the original datasets (e.g., +0.5% on the function naming task at F1 score).
Fri 19 MayDisplayed time zone: Hobart change
11:00 - 12:30 | Program comprehensionTechnical Track / Journal-First Papers at Meeting Room 103 Chair(s): Oscar Chaparro College of William and Mary | ||
11:00 15mTalk | Code Comprehension Confounders: A Study of Intelligence and Personality Journal-First Papers Link to publication Pre-print | ||
11:15 15mTalk | Identifying Key Classes for Initial Software Comprehension: Can We Do It Better? Technical Track Weifeng Pan Zhejiang Gongshang University, China, Xin Du Zhejiang Gongshang University, China, Hua Ming Oakland University, Dae-Kyoo Kim Oakland University, Zijiang Yang Xi'an Jiaotong University and GuardStrike Inc | ||
11:30 15mTalk | Improving API Knowledge Discovery with ML: A Case Study of Comparable API Methods Technical Track Daye Nam Carnegie Mellon University, Brad A. Myers Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University Pre-print | ||
11:45 15mTalk | Evidence Profiles for Validity Threats in Program Comprehension Experiments Technical Track Marvin Muñoz Barón University of Stuttgart, Marvin Wyrich Saarland University, Daniel Graziotin University of Stuttgart, Stefan Wagner University of Stuttgart Pre-print | ||
12:00 15mTalk | Developers’ Visuo-spatial Mental Model and Program Comprehension Technical Track Pre-print | ||
12:15 15mTalk | Two Sides of the Same Coin: Exploiting the Impact of Identifiers in Neural Code Comprehension Technical Track Shuzheng Gao Harbin institute of technology, Cuiyun Gao Harbin Institute of Technology, Chaozheng Wang Harbin Institute of Technology, Jun Sun Singapore Management University, David Lo Singapore Management University, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China |