The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation
The capabilities of Large Language Models (LLMs) in code generation, particularly for implementing target functionalities from natural language descriptions, have been extensively studied. As an alternative form of natural language, input-output examples (I/O examples) provide an accessible, unambiguous, and flexible way to describe functionalities, but the diversity, sparseness, and incompleteness of I/O examples also place challenges on understanding and implementing requirements. Therefore, generating code from input-output examples (i.e., example-based code generation) provides a new perspective, allowing us to evaluate LLMs’ capability to infer target functionalities from limited information and to process new-form requirements. However, related research about LLMs in example-based code generation remains largely unexplored. To fill this gap, this paper presents the first comprehensive study on example-based code generation using LLMs. To address the incorrectness caused by the incompleteness of I/O examples, we adopt an iterative evaluation framework and formalize the objective of example-based code generation as two sequential sub-objectives: generating code conforming to given examples and generating code that successfully implements the target functionalities from (iteratively) given examples. We assess six state-of-the-art LLMs using a new benchmark of 168 diverse target functionalities (derived from HumanEval and CodeHunt). The results demonstrate that when requirements were described using iterative input-output examples rather than natural language, the LLMs’ score decreased by over 60%, indicating that example-based code generation remains challenging for the evaluated LLMs. More interestingly, the vast majority (even over 95%) of successfully implemented functionalities are achieved in the first round of iterations, suggesting that the LLMs struggle to effectively utilize the iteratively supplemented requirements. Furthermore, we find that combining I/O examples with even imprecise natural language descriptions significantly improves LLM performance, and that while the choice of initial I/O examples has a limited impact on the score for most functionalities, a subset of functionalities shows high sensitivity to the initial examples, suggesting opportunities for prompt optimization. These findings highlight the importance of early prompts during interactions and offer critical insights and implications for enhancing LLM-driven code generation.
Thu 26 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:30 | Code and Documentation GenerationResearch Papers / Tool Demonstrations at Cosmos Hall Chair(s): Ying Zou Queen's University, Kingston, Ontario | ||
14:00 25mTalk | The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation Research Papers Yingjie Fu Peking University, Bozhou Li Peking University, Linyi Li Simon Fraser University, Wentao Zhang Peking University, Tao Xie Peking University DOI | ||
14:25 25mTalk | VerLog: Enhancing Release Note Generation for Android Apps using Large Language Models Research Papers Jiawei Guo University at Buffalo, SUNY, Haoran Yang Washington State University, Haipeng Cai University at Buffalo, SUNY DOI | ||
14:50 25mTalk | Can LLMs replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering Tasks Research Papers Ruiqi Wang Harbin Institute of Technology, Shenzhen, Jiyu Guo Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Guodong Fan Shandong Agriculture and Engineering University, Chun Yong Chong Huawei, Xin Xia Zhejiang University DOI Pre-print | ||
15:15 15mDemonstration | Code2API: A Tool for Generating Reusable APIs from Stack Overflow Code Snippets Tool Demonstrations Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Xing Hu Zhejiang University, Lingfeng Bao Zhejiang University, Jingyuan Chen , JianLing Sun Zhejiang University |
This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.
The room is just in front of the registration desk, on the other side of the main conference area. The two large doors with numbers “1” and “2” provide access to the Cosmos Hall.