The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation (ISSTA 2025 - Research Papers)

Who

Yingjie Fu, Bozhou Li, Linyi Li, Wentao Zhang, Tao Xie

Track

ISSTA 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 26 Jun 2025 14:00 - 14:25 at Cosmos Hall - Code and Documentation Generation Chair(s): Ying Zou

Abstract

The capabilities of Large Language Models (LLMs) in code generation, particularly for implementing target functionalities from natural language descriptions, have been extensively studied. As an alternative form of natural language, input-output examples (I/O examples) provide an accessible, unambiguous, and flexible way to describe functionalities, but the diversity, sparseness, and incompleteness of I/O examples also place challenges on understanding and implementing requirements. Therefore, generating code from input-output examples (i.e., example-based code generation) provides a new perspective, allowing us to evaluate LLMs’ capability to infer target functionalities from limited information and to process new-form requirements. However, related research about LLMs in example-based code generation remains largely unexplored. To fill this gap, this paper presents the first comprehensive study on example-based code generation using LLMs. To address the incorrectness caused by the incompleteness of I/O examples, we adopt an iterative evaluation framework and formalize the objective of example-based code generation as two sequential sub-objectives: generating code conforming to given examples and generating code that successfully implements the target functionalities from (iteratively) given examples. We assess six state-of-the-art LLMs using a new benchmark of 168 diverse target functionalities (derived from HumanEval and CodeHunt). The results demonstrate that when requirements were described using iterative input-output examples rather than natural language, the LLMs’ score decreased by over 60%, indicating that example-based code generation remains challenging for the evaluated LLMs. More interestingly, the vast majority (even over 95%) of successfully implemented functionalities are achieved in the first round of iterations, suggesting that the LLMs struggle to effectively utilize the iteratively supplemented requirements. Furthermore, we find that combining I/O examples with even imprecise natural language descriptions significantly improves LLM performance, and that while the choice of initial I/O examples has a limited impact on the score for most functionalities, a subset of functionalities shows high sensitivity to the initial examples, suggesting opportunities for prompt optimization. These findings highlight the importance of early prompts during interactions and offer critical insights and implications for enhancing LLM-driven code generation.

DOI

https://doi.org/10.1145/3728947

Yingjie Fu

Peking University

China

Bozhou Li

Peking University

Linyi Li

Simon Fraser University

Canada

Wentao Zhang

Peking University

Tao Xie

Peking University

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 26 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30	Code and Documentation GenerationResearch Papers / Tool Demonstrations at Cosmos Hall Chair(s): Ying Zou Queen's University, Kingston, Ontario

14:00 25m Talk		The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation Research Papers Yingjie Fu Peking University, Bozhou Li Peking University, Linyi Li Simon Fraser University, Wentao Zhang Peking University, Tao Xie Peking University DOI
14:25 25m Talk		VerLog: Enhancing Release Note Generation for Android Apps using Large Language Models Research Papers Jiawei Guo University at Buffalo, SUNY, Haoran Yang Washington State University, Haipeng Cai University at Buffalo, SUNY DOI
14:50 25m Talk		Can LLMs replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering Tasks Research Papers Ruiqi Wang Harbin Institute of Technology, Shenzhen, Jiyu Guo Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Guodong Fan Shandong Agriculture and Engineering University, Chun Yong Chong Huawei, Xin Xia Zhejiang University DOI Pre-print
15:15 15m Demonstration		Code2API: A Tool for Generating Reusable APIs from Stack Overflow Code Snippets Tool Demonstrations Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Xing Hu Zhejiang University, Lingfeng Bao Zhejiang University, Jingyuan Chen , JianLing Sun Zhejiang University

Information for Participants

Thu 26 Jun 2025 14:00 - 15:30 at Cosmos Hall - Code and Documentation Generation Chair(s): Ying Zou

Info for room Cosmos Hall:

This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.

The room is just in front of the registration desk, on the other side of the main conference area. The two large doors with numbers “1” and “2” provide access to the Cosmos Hall.