Two Sides of the Same Coin: Exploiting the Impact of Identifiers in Neural Code Comprehension (ICSE 2023 - Technical Track)

Who

Shuzheng Gao, Cuiyun Gao, Chaozheng Wang, Jun Sun, David Lo, Yue Yu

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 May 2023 12:15 - 12:30 at Meeting Room 103 - Program comprehension Chair(s): Oscar Chaparro

Abstract

Previous studies have demonstrated that neural code comprehension models are vulnerable to identifier to identifier naming. By renaming as few as one identifier in source code, the models would output completely irrelevant results, indicating that identifiers can be misleading for model prediction. However, identifiers are not completely detrimental to code comprehension, since the semantics of identifier names can be related to the program semantics. Well exploiting the two opposite impacts of identifiers is essential for enhancing the robustness and accuracy of neural code comprehension, and still remains under-explored. In this work, we propose to model the impact of identifiers from a novel causal perspective, and propose a counterfactual reasoning-based framework named CREAM. CREAM explicitly captures the misleading information of identifiers through multi-task learning in the training stage, and reduces the misleading impact by counterfactual inference in the inference stage. We evaluate CREAM on three popular neural code comprehension tasks, including function naming, defect detection and code classification. Experiment results show that CREAM not only significantly outperforms baselines in terms of robustness (e.g., +37.9% on the function naming task at F1 score) on the datasets with identifiers renamed, but also achieve relatively better results on the original datasets (e.g., +0.5% on the function naming task at F1 score).

Shuzheng Gao

Harbin institute of technology

Cuiyun Gao

Harbin Institute of Technology

China

Chaozheng Wang

Harbin Institute of Technology

China

Jun Sun

Singapore Management University

Singapore

David Lo

Singapore Management University

Singapore

Yue Yu

College of Computer, National University of Defense Technology, Changsha 410073, China

China

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 19 May
Displayed time zone: Hobart change

11:00 - 12:30	Program comprehensionTechnical Track / Journal-First Papers at Meeting Room 103 Chair(s): Oscar Chaparro College of William and Mary

11:00 15m Talk		Code Comprehension Confounders: A Study of Intelligence and Personality Journal-First Papers Stefan Wagner University of Stuttgart, Marvin Wyrich Saarland University Link to publication Pre-print
11:15 15m Talk		Identifying Key Classes for Initial Software Comprehension: Can We Do It Better? Technical Track Weifeng Pan Zhejiang Gongshang University, China, Xin Du Zhejiang Gongshang University, China, Hua Ming Oakland University, Dae-Kyoo Kim Oakland University, Zijiang Yang Xi'an Jiaotong University and GuardStrike Inc
11:30 15m Talk		Improving API Knowledge Discovery with ML: A Case Study of Comparable API Methods Technical Track Daye Nam Carnegie Mellon University, Brad A. Myers Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University Pre-print
11:45 15m Talk		Evidence Profiles for Validity Threats in Program Comprehension Experiments Technical Track Marvin Muñoz Barón University of Stuttgart, Marvin Wyrich Saarland University, Daniel Graziotin University of Stuttgart, Stefan Wagner University of Stuttgart Pre-print
12:00 15m Talk		Developers’ Visuo-spatial Mental Model and Program Comprehension Technical Track Abir Bouraffa University of Hamburg, Gian-Luca Fuhrmann , Walid Maalej University of Hamburg Pre-print
12:15 15m Talk		Two Sides of the Same Coin: Exploiting the Impact of Identifiers in Neural Code Comprehension Technical Track Shuzheng Gao Harbin institute of technology, Cuiyun Gao Harbin Institute of Technology, Chaozheng Wang Harbin Institute of Technology, Jun Sun Singapore Management University, David Lo Singapore Management University, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China