How Does ChatGPT Make Assumptions When Creating Erroneous Programs?
This program is tentative and subject to change.
Large Language Models (LLMs) like ChatGPT are increasingly integrated into software development environments due to their strong performance in code generation. However, they often struggle with complex logic, security vulnerabilities, and code quality issues. These problems frequently originate from misunderstandings of problem requirements and logical inconsistencies, which can lead to faulty or vulnerable software. In this study, we conduct an initial empirical analysis to investigate the causes of erroneous code generated by the state-of-the-art LLM model GPT-4o. Using the HumanEval dataset, we prompt GPT-4o to generate Python solutions and list its 3 most important assumptions. We validate these outputs against the provided test cases in dataset and identify 17 defective programs out of 164 total solutions. By analyzing the 17 failures and 51 assumptions made on these tasks, we find that about 53% the failures are directly related to wrong or erroneously implemented assumptions raised by the GPT model itself, and totally 71% of code generation failures are related to erroneously made or implemented assumptions.
This program is tentative and subject to change.
Mon 17 NovDisplayed time zone: Seoul change
16:00 - 17:00 | |||
16:00 10mTalk | A Characterization Study of Bugs in LLM Agent Workflow Orchestration Frameworks Industry Showcase Ziluo Xue Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Shenao Wang Huazhong University of Science and Technology, Kai Chen Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology | ||
16:10 10mTalk | Debugging the Undebuggable: Why Multi-Fault Programs Break Debugging and Repair Tools NIER Track Omar I. Al-Bataineh Gran Sasso Science Institute (GSSI) | ||
16:20 10mTalk | ErrorPrism: Reconstructing Error Propagation Paths in Cloud Service Systems Industry Showcase Junsong Pu School of Software Engineering, Sun Yat-sen University, Yichen LI ByteDance, Zhuangbin Chen Sun Yat-sen University, Jinyang Liu ByteDance, Zhihan Jiang The Chinese University of Hong Kong, Jianjun Chen Bytedance, Rui Shi Bytedance, Zibin Zheng Sun Yat-sen University, Tieying Zhang ByteDance | ||
16:30 10mTalk | Fault Injection for Simulink-based CPS Models: Insights and Future Directions NIER Track Drishti Yadav University of Luxembourg, Luxembourg, Claudio Mandrioli University of Luxembourg, Ezio Bartocci TU Wien, Domenico Bianculli University of Luxembourg | ||
16:40 10mTalk | How Does ChatGPT Make Assumptions When Creating Erroneous Programs? NIER Track | ||
16:50 10mTalk | Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks NIER Track Ruofan Lu The Chinese University of Hong Kong, Yichen LI ByteDance, Yintong Huo Singapore Management University, Singapore | ||