On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions (ICPC 2024 - Research Track) - ICPC 2024

Sun 14 - Sat 20 April 2024 Lisbon, Portugal

co-located with ICSE 2024

Who

Matteo Ciniselli, Alberto Martin-Lopez, Gabriele Bavota

Track

ICPC 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Mon 15 Apr 2024 14:20 - 14:30 at Sophia de Mello Breyner Andresen - Code + Documentation Generation Chair(s): Massimiliano Di Penta

Abstract

Code completion is a key feature of Integrated Development Environments (IDEs), aimed at predicting the next tokens a developer is likely to write, helping them write code faster and with less effort. Modern code completion approaches are often powered by deep learning (DL) models. However, the swift evolution of programming languages poses a critical challenge to the performance of DL-based code completion models: Can these models generalize across different language versions? This paper delves into such a question. In particular, we assess the capabilities of a state-of-the-art model, CodeT5, to generalize across nine different Java versions, ranging from Java 2 to Java 17, while being exclusively trained on Java 8 code. Our evaluation spans three completion scenarios, namely, predicting tokens, constructs (e.g., the condition of an if statement) and entire code blocks. The results of our study reveal a noticeable disparity among language versions, with the worst performance being obtained in Java 2 and 17—the most far apart versions compared to Java 8. We investigate possible causes for the performance degradation and show that the adoption of a limited version-specific fine-tuning can partially alleviate the problem. Our work raises awareness on the importance of continuous model refinement, and it can inform the design of alternatives to make code completion models more robust to language evolution.

Matteo Ciniselli

Università della Svizzera Italiana

Alberto Martin-Lopez

Software Institute - USI, Lugano

Switzerland

Gabriele Bavota

Software Institute @ Università della Svizzera Italiana

Switzerland

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Mon 15 Apr
Displayed time zone: Lisbon change

	14:00 - 15:30	Code + Documentation GenerationResearch Track / / Early Research Achievements (ERA) / Replications and Negative Results (RENE) at Sophia de Mello Breyner Andresen Chair(s): Massimiliano Di Penta University of Sannio, Italy

	14:00 10m Talk		MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment GenerationICPCICPC Full paper Research Track Xinglu Pan Peking University, Chenxiao Liu Peking University, Yanzhen Zou Peking University, Tao Xie Peking University, Bing Xie Peking University Pre-print
	14:10 10m Talk		Compositional API Recommendation for Library-Oriented Code GenerationICPCICPC Full paper Research Track Zexiong Ma Peking University, Shengnan An Xi’an Jiaotong University, Bing Xie Peking University, Zeqi Lin Microsoft Research, China Pre-print
	14:20 10m Talk		On the Generalizability of Deep Learning-based Code Completion Across Programming Language VersionsICPCICPC Full paper Research Track Matteo Ciniselli Università della Svizzera Italiana, Alberto Martin-Lopez Software Institute - USI, Lugano, Gabriele Bavota Software Institute @ Università della Svizzera Italiana
	14:30 10m Talk		ESGen: Commit Message Generation Based on Edit Sequence of Code ChangeICPCICPC Full paperVirtual-Talk Research Track Xiangping Chen Sun Yat-sen University, Yangzi Li SUN YAT-SEN UNIVERSITY, Zhicao Tang SUN YAT-SEN UNIVERSITY, Yuan Huang School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China, Haojie Zhou School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China, Mingdong Tang Guangdong University of Foreign Studies, Zibin Zheng Sun Yat-sen University
	14:40 10m Talk		Improving AST-Level Code Completion with Graph Retrieval and Multi-Field AttentionICPCICPC Full paperVirtual-Talk Research Track Yu Xia Central South University, Tian Liang Central South University, Wei-Huan Min Central South University, Li Kuang School of Computer Science and Engineering, Central South University
	14:50 10m Talk		Exploring and Improving Code Completion for Test CodeICPCICPC Full paper Research Track Tingwei Zhu Nanjing University, Zhongxin Liu Zhejiang University, Tongtong Xu Huawei, Ze Tang Software Institute, Nanjing University, Tian Zhang Nanjing University, Minxue Pan Nanjing University, Xin Xia Huawei Technologies
	15:00 10m Talk		Understanding the Impact of Branch Edit Features for the Automatic Prediction of Merge Conflict ResolutionsICPCICPC RENE Paper Replications and Negative Results (RENE) Waad riadh aldndni Virginia Tech, Francisco Servant ITIS Software, University of Malaga, Na Meng Virginia Tech
	15:10 4m Talk		Investigating the Efficacy of Large Language Models for Code Clone DetectionICPCICPC ERA Paper Early Research Achievements (ERA) Mohamad Khajezade University of British Columbia Okanagan, Jie JW Wu University of British Columbia (UBC), Fatemeh Hendijani Fard University of British Columbia, Gema Rodríguez-Pérez University of British Columbia (UBC), Mohamed S Shehata University of British Columbia
	15:14 16m Talk		Code + Documentation Generation: Panel with SpeakersICPC Discussion