ICSME 2025
Sun 7 - Fri 12 September 2025 Auckland, New Zealand
Fri 12 Sep 2025 13:55 - 14:05 at Case Room 3 260-055 - Session 15 - Reuse 2 Chair(s): Elliott Wen

The field of software engineering and coding has undergone a significant transformation. The integration of large language models (LLMs), such as ChatGPT, into software development workflows has changed how developers at all skill levels approach coding tasks. Leveraging the capabilities of LLMs, developers can now implement functionalities, fix bugs, and address reviewers’ comments more efficiently. However, prior research shows that the effectiveness of LLM-generated code is heavily influenced by the prompting strategy used. Furthermore, generating code at the class level is significantly more complex than at the method level, as it requires maintaining consistency across multiple methods and managing class state. Therefore, in this study, we evaluate the impact of four prompting strategies (i.e., Zero-Shot, Few-Shot, Chain-of-Thought (CoT), and CoT Few-Shot) on GPT’s performance in generating class-level code. We assess both the functional correctness and the quality characteristics of the generated code. To better understand how errors differ by prompting strategy, we conduct a qualitative analysis of the generated code for test cases that fail. We find that strategies incorporating more contextual guidance (Few-Shot, CoT, and CoT Few-Shot) outperform Zero-Shot prompting by up to 25% in functional correctness, 31% in BLEU-3 score, and 50% in ROUGE-L, while also producing code that is more readable and maintainable. Also, our results indicate that procedural logic and control flow errors are the most prominent, accounting for 25% of all errors. We believe our study provides valuable insights to guide future research in developing techniques and tools that enhance the quality and reliability of LLM-generated code for complex software development tasks.

Fri 12 Sep

Displayed time zone: Auckland, Wellington change

13:30 - 15:00
Session 15 - Reuse 2NIER Track / Industry Track / Research Papers Track at Case Room 3 260-055
Chair(s): Elliott Wen The University of Auckland
13:30
15m
AST-Enhanced or AST-Overloaded? The Surprising Impact of Hybrid Graph Representations on Code Clone Detection
Research Papers Track
Zixian Zhang School of Computer Science, University of Galway, Takfarinas Saber School of Computer Science, University of Galway
13:45
10m
Client–Library Compatibility Testing with API Interaction Snapshots
NIER Track
Gustave Monce Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Thomas Degueule CNRS, Jean-Rémy Falleri Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI. Institut Universitaire de France., Romain Robbes CNRS, LaBRI, University of Bordeaux
Pre-print
13:55
10m
Prompting Matters: Assessing the Effect of Prompting Techniques on LLM-Generated Class Code
NIER Track
Adam Yuen University of Calgary, John Pangas University of Calgary, Md Mainul Hasan Polash University of Calgary, Ahmad Abdellatif University of Calgary
14:05
10m
From First Use to Final Commit: Studying the Evolution of Multi-CI Service Adoption
NIER Track
Nitika Chopra Trent University, Taher A. Ghaleb Trent University
Pre-print
14:15
15m
Automated Recovery of Software Product Lines from Legacy Configurable Codebases
Industry Track
Tewfik Ziadi University of Doha for Science and Technology (UDST), Karim Ghallab Sorbonne Université - RedFabriQ/Mobioos, Zaak Chalal RedFabriQ/Mobioos
14:30
15m
Integrating Rules and Semantics for LLM-Based C-to-Rust Translation
Industry Track
Feng Luo Harbin Institute of Technology (Shenzhen), Kexing Ji Harbin Institute of Technology (Shenzhen), Cuiyun Gao Harbin Institute of Technology, Shenzhen, Shuzheng Gao Chinese University of Hong Kong, jiafeng Harbin Institute of Technology (Shenzhen), Kui Liu Huawei, Xin Xia Zhejiang University, Michael Lyu The Chinese University of Hong Kong