ChatGPT-Based Test Generation for Refactoring Engines Enhanced by Feature Analysis on Examples (ICSE 2025 - Research Track)

Who

Chunhao Dong, Yanjie Jiang, Yuxia Zhang, Yang Zhang, Hui Liu

Track

ICSE 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 11:15 - 11:30 at Canada Hall 1 and 2 - AI for SE 3 Chair(s): Ying Zou

Abstract

Software refactoring is widely employed to improve software quality. However, conducting refactorings manually is tedious, time-consuming, and error-prone. Consequently, automated and semi-automated tool support is highly desirable for software refactoring in the industry, and most of the main-stream IDEs provide powerful tool support for refactoring. However, complex refactoring engines are prone to errors, which in turn may result in imperfect and incorrect refactorings. To this end, in this paper, we propose a ChatGPT-based approach to testing refactoring engines. We first manually analyze bug reports and test cases associated with refactoring engines, and construct a feature library containing fine-grained features that may trigger defects in refactoring engines. The approach automatically generates prompts according to both predefined prompt templates and features randomly selected from the feature library, requesting ChatGPT to generate test programs with the requested features. Test programs generated by ChatGPT are then forwarded to multiple refactoring engines for differential testing. To the best of our knowledge, it is the first approach in testing refactoring engines that guides test program generation with features derived from existing bugs. It is also the first approach in this line that exploits LLMs in the generation of test programs. Our initial evaluation of four main-stream refactoring engines suggests that the proposed approach is effective. It identified a total of 115 previously unknown bugs besides 28 inconsistent refactoring behaviors among different engines. Among the 115 bugs, 78 have been manually confirmed by the original developers of the tested engines, i.e., IntelliJ IDEA, Eclipse, VScode-Java, and NetBeans.

Chunhao Dong

Beijing Institute of Technology

Yanjie Jiang

Peking University

Yuxia Zhang

Beijing Institute of Technology

China

Yang Zhang

Hebei University of Science and Technology

China

Hui Liu

Beijing Institute of Technology

China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	AI for SE 3New Ideas and Emerging Results (NIER) / Journal-first Papers / Research Track / SE In Practice (SEIP) at Canada Hall 1 and 2 Chair(s): Ying Zou Queen's University, Kingston, Ontario

11:00 15m Talk		A First Look at Conventional Commits Classification Research Track Qunhong Zeng Beijing Institute of Technology, Yuxia Zhang Beijing Institute of Technology, Zhiqing Qiu Beijing Institute of Technology, Hui Liu Beijing Institute of Technology
11:15 15m Talk		ChatGPT-Based Test Generation for Refactoring Engines Enhanced by Feature Analysis on Examples Research Track Chunhao Dong Beijing Institute of Technology, Yanjie Jiang Peking University, Yuxia Zhang Beijing Institute of Technology, Yang Zhang Hebei University of Science and Technology, Hui Liu Beijing Institute of Technology
11:30 15m Talk		SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing Research Track Wenchao Gu The Chinese University of Hong Kong, Ensheng Shi Xi’an Jiaotong University, Yanlin Wang Sun Yat-sen University, Lun Du Microsoft Research, Shi Han Microsoft Research, Hongyu Zhang Chongqing University, Dongmei Zhang Microsoft Research, Michael Lyu The Chinese University of Hong Kong
11:45 15m Talk		UniGenCoder: Merging Seq2Seq and Seq2Tree Paradigms for Unified Code Generation New Ideas and Emerging Results (NIER) Liangying Shao School of Informatics, Xiamen University, China, Yanfu Yan William & Mary, Denys Poshyvanyk William & Mary, Jinsong Su School of Informatics, Xiamen University, China
12:00 15m Talk		How is Google using AI for internal code migrations? SE In Practice (SEIP) Stoyan Nikolov Google, Inc., Daniele Codecasa Google, Inc., Anna Sjovall Google, Inc., Maxim Tabachnyk Google, Siddharth Taneja Google, Inc., Celal Ziftci Google, Satish Chandra Google, Inc
12:15 7m Talk		LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation Journal-first Papers Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research Link to publication
12:22 7m Talk		The impact of Concept drift and Data leakage on Log Level Prediction Models Journal-first Papers Youssef Esseddiq Ouatiti Queen's university, Mohammed Sayagh ETS Montreal, University of Quebec, Noureddine Kerzazi Ensias-Rabat, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University, Youssef Esseddiq Ouatiti Queen's university