Prompting Matters: Assessing the Effect of Prompting Techniques on LLM-Generated Class Code (ICSME 2025 - New Ideas and Emerging Results Track) - ICSME 2025 - International Conference on Software Maintenance and Evolution

Who

Adam Yuen, John Pangas, Md Mainul Hasan Polash, Ahmad Abdellatif

Track

ICSME 2025 NIER Track

Time Zone

The program is currently displayed in (GMT+12:00) Auckland, Wellington.

Use conference time zone: (GMT+12:00) Auckland, WellingtonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 12 Sep 2025 13:55 - 14:05 at Case Room 3 260-055 - Session 15 - Reuse 2 Chair(s): Elliott Wen

Abstract

The field of software engineering and coding has undergone a significant transformation. The integration of large language models (LLMs), such as ChatGPT, into software development workflows has changed how developers at all skill levels approach coding tasks. Leveraging the capabilities of LLMs, developers can now implement functionalities, fix bugs, and address reviewers’ comments more efficiently. However, prior research shows that the effectiveness of LLM-generated code is heavily influenced by the prompting strategy used. Furthermore, generating code at the class level is significantly more complex than at the method level, as it requires maintaining consistency across multiple methods and managing class state. Therefore, in this study, we evaluate the impact of four prompting strategies (i.e., Zero-Shot, Few-Shot, Chain-of-Thought (CoT), and CoT Few-Shot) on GPT’s performance in generating class-level code. We assess both the functional correctness and the quality characteristics of the generated code. To better understand how errors differ by prompting strategy, we conduct a qualitative analysis of the generated code for test cases that fail. We find that strategies incorporating more contextual guidance (Few-Shot, CoT, and CoT Few-Shot) outperform Zero-Shot prompting by up to 25% in functional correctness, 31% in BLEU-3 score, and 50% in ROUGE-L, while also producing code that is more readable and maintainable. Also, our results indicate that procedural logic and control flow errors are the most prominent, accounting for 25% of all errors. We believe our study provides valuable insights to guide future research in developing techniques and tools that enhance the quality and reliability of LLM-generated code for complex software development tasks.

Adam Yuen

University of Calgary

Canada

John Pangas

University of Calgary

Canada

Md Mainul Hasan Polash

University of Calgary

Canada

Ahmad Abdellatif

University of Calgary

Canada

Time Zone

The program is currently displayed in (GMT+12:00) Auckland, Wellington.

Use conference time zone: (GMT+12:00) Auckland, WellingtonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 12 Sep
Displayed time zone: Auckland, Wellington change

13:30 - 15:00	Session 15 - Reuse 2NIER Track / Industry Track / Research Papers Track at Case Room 3 260-055 Chair(s): Elliott Wen The University of Auckland

13:30 15m		AST-Enhanced or AST-Overloaded? The Surprising Impact of Hybrid Graph Representations on Code Clone Detection Research Papers Track Zixian Zhang School of Computer Science, University of Galway, Takfarinas Saber School of Computer Science, University of Galway
13:45 10m		Client–Library Compatibility Testing with API Interaction Snapshots NIER Track Gustave Monce Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Thomas Degueule CNRS, Jean-Rémy Falleri Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI. Institut Universitaire de France., Romain Robbes CNRS, LaBRI, University of Bordeaux Pre-print
13:55 10m		Prompting Matters: Assessing the Effect of Prompting Techniques on LLM-Generated Class Code NIER Track Adam Yuen University of Calgary, John Pangas University of Calgary, Md Mainul Hasan Polash University of Calgary, Ahmad Abdellatif University of Calgary
14:05 10m		From First Use to Final Commit: Studying the Evolution of Multi-CI Service Adoption NIER Track Nitika Chopra Trent University, Taher A. Ghaleb Trent University Pre-print
14:15 15m		Automated Recovery of Software Product Lines from Legacy Configurable Codebases Industry Track Tewfik Ziadi University of Doha for Science and Technology (UDST), Karim Ghallab Sorbonne Université - RedFabriQ/Mobioos, Zaak Chalal RedFabriQ/Mobioos
14:30 15m		Integrating Rules and Semantics for LLM-Based C-to-Rust Translation Industry Track Feng Luo Harbin Institute of Technology (Shenzhen), Kexing Ji Harbin Institute of Technology (Shenzhen), Cuiyun Gao Harbin Institute of Technology, Shenzhen, Shuzheng Gao Chinese University of Hong Kong, jiafeng Harbin Institute of Technology (Shenzhen), Kui Liu Huawei, Xin Xia Zhejiang University, Michael Lyu The Chinese University of Hong Kong