CodeS: Towards Code Model Generalization Under Distribution Shift (ICSE 2023 - NIER - New Ideas and Emerging Results)

Who

Qiang Hu, Yuejun GUo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, Yves Le Traon

Track

ICSE 2023 NIER - New Ideas and Emerging Results

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 May 2023 12:07 - 12:15 at Level G - Plenary Room 1 - AI models for SE Chair(s): Denys Poshyvanyk

Abstract

Distribution shift has been a longstanding challenge for the reliable deployment of deep learning (DL) models due to unexpected accuracy degradation. Although DL has been becoming a driving force for large-scale source code analysis in the big code era, limited progress has been made on distribution shift analysis and benchmarking for source code tasks. To fill this gap, this paper initiates to propose CodeS, a distribution shift benchmark dataset, for source code learning. Specifically, CodeS supports 2 programming languages (Java and Python) and 5 shift types (task, programmer, time-stamp, token, and concrete syntax tree). Extensive experiments based on CodeS reveal that 1) out-of-distribution detectors from other domains (e.g., computer vision) do not generalize to source code, 2) all code classification models suffer from distribution shifts, 3) representation-based shifts have a higher impact on the model than others, and 4) pre-trained bimodal models are relatively more resistant to distribution shifts.

Qiang Hu

University of Luxembourg

Yuejun GUo

University of Luxembourg

Xiaofei Xie

Singapore Management University

Singapore

Maxime Cordy

University of Luxembourg, Luxembourg

Lei Ma

University of Alberta

Canada

Mike Papadakis

University of Luxembourg, Luxembourg

Luxembourg

Yves Le Traon

University of Luxembourg, Luxembourg

Luxembourg

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 May
Displayed time zone: Hobart change

11:00 - 12:30	AI models for SEJournal-First Papers / Technical Track / DEMO - Demonstrations / NIER - New Ideas and Emerging Results at Level G - Plenary Room 1 Chair(s): Denys Poshyvanyk College of William and Mary

11:00 15m Talk		One Adapter for All Programming Languages? Adapter Tuning for Multilingual Tasks in Software Engineering Technical Track Deze Wang National University of Defense Technology, Boxing Chen , Shanshan Li National University of Defense Technology, Wei Luo , Shaoliang Peng Hunan University, Wei Dong School of Computer, National University of Defense Technology, China, Liao Xiangke National University of Defense Technology
11:15 15m Talk		CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back Technical Track Zhongxin Liu Zhejiang University, Zhijie Tang Zhejiang University, Xin Xia Huawei, Xiaohu Yang Zhejiang University Pre-print
11:30 15m Talk		Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models Technical Track Shuzheng Gao Harbin institute of technology, Hongyu Zhang The University of Newcastle, Cuiyun Gao Harbin Institute of Technology, Chaozheng Wang Harbin Institute of Technology
11:45 7m Talk		PCR-Chain: Partial Code Reuse Assisted by Hierarchical Chaining of Prompts on Frozen Copilot DEMO - Demonstrations Qing Huang School of Computer Information Engineering, Jiangxi Normal University, Jiahui Zhu School of Computer Information Engineering, Jiangxi Normal University, Zhilong Li School of Computer Information Engineering, Jiangxi Normal University, Zhenchang Xing , Changjing Wang School of Computer Information Engineering, Jiangxi Normal University, Xiwei (Sherry) Xu CSIRO’s Data61
11:52 7m Talk		Towards Learning Generalizable Code Embeddings using Task-agnostic Graph Convolutional Networks Journal-First Papers Zishuo Ding Concordia University, Heng Li Polytechnique Montréal, Weiyi Shang University of Waterloo, Tse-Hsun (Peter) Chen Concordia University
12:00 7m Talk		deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search Journal-First Papers Chen Zeng National University of Defense Technology, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China, Shanshan Li National University of Defense Technology, Xin Xia Huawei, Wang Zhiming National University of Defense Technology, Mingyang Geng National University of Defense Technology, Linxiao Bai National University of Defense Technology, Wei Dong School of Computer, National University of Defense Technology, China, Liao Xiangke National University of Defense Technology
12:07 7m Talk		CodeS: Towards Code Model Generalization Under Distribution Shift NIER - New Ideas and Emerging Results Qiang Hu University of Luxembourg, Yuejun GUo University of Luxembourg, Xiaofei Xie Singapore Management University, Maxime Cordy University of Luxembourg, Luxembourg, Lei Ma University of Alberta, Mike Papadakis University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg
12:15 7m Talk		Towards using Few-Shot Prompt Learning for Automating Model Completion NIER - New Ideas and Emerging Results Meriem Ben Chaaben Université de Montréal, DIRO, Lola Burgueño University of Malaga, Houari Sahraoui Université de Montréal