How Does ChatGPT Make Assumptions When Creating Erroneous Programs? (ASE 2025 - New Ideas and Emerging Results Track) - ASE 2025

Sun 16 - Thu 20 November 2025 Seoul, South Korea

Who

Sadia Jahan, Xiaoyin Wang

Track

ASE 2025 NIER Track

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Mon 17 Nov 2025 16:40 - 16:50 at Grand Hall 2 - Bug Understanding 2

Abstract

Large Language Models (LLMs) like ChatGPT are increasingly integrated into software development environments due to their strong performance in code generation. However, they often struggle with complex logic, security vulnerabilities, and code quality issues. These problems frequently originate from misunderstandings of problem requirements and logical inconsistencies, which can lead to faulty or vulnerable software. In this study, we conduct an initial empirical analysis to investigate the causes of erroneous code generated by the state-of-the-art LLM model GPT-4o. Using the HumanEval dataset, we prompt GPT-4o to generate Python solutions and list its 3 most important assumptions. We validate these outputs against the provided test cases in dataset and identify 17 defective programs out of 164 total solutions. By analyzing the 17 failures and 51 assumptions made on these tasks, we find that about 53% the failures are directly related to wrong or erroneously implemented assumptions raised by the GPT model itself, and totally 71% of code generation failures are related to erroneously made or implemented assumptions.

Sadia Jahan

University of Texas at San Antonio

United States

Xiaoyin Wang

University of Texas at San Antonio

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Mon 17 Nov
Displayed time zone: Seoul change

	16:00 - 17:00	Bug Understanding 2Industry Showcase / NIER Track at Grand Hall 2

	16:00 10m Talk		A Characterization Study of Bugs in LLM Agent Workflow Orchestration Frameworks Industry Showcase Ziluo Xue Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Shenao Wang Huazhong University of Science and Technology, Kai Chen Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology
	16:10 10m Talk		Debugging the Undebuggable: Why Multi-Fault Programs Break Debugging and Repair Tools NIER Track Omar I. Al-Bataineh Gran Sasso Science Institute (GSSI)
	16:20 10m Talk		ErrorPrism: Reconstructing Error Propagation Paths in Cloud Service Systems Industry Showcase Junsong Pu School of Software Engineering, Sun Yat-sen University, Yichen LI ByteDance, Zhuangbin Chen Sun Yat-sen University, Jinyang Liu ByteDance, Zhihan Jiang The Chinese University of Hong Kong, Jianjun Chen Bytedance, Rui Shi Bytedance, Zibin Zheng Sun Yat-sen University, Tieying Zhang ByteDance
	16:30 10m Talk		Fault Injection for Simulink-based CPS Models: Insights and Future Directions NIER Track Drishti Yadav University of Luxembourg, Luxembourg, Claudio Mandrioli University of Luxembourg, Ezio Bartocci TU Wien, Domenico Bianculli University of Luxembourg
	16:40 10m Talk		How Does ChatGPT Make Assumptions When Creating Erroneous Programs? NIER Track Sadia Jahan University of Texas at San Antonio, Xiaoyin Wang University of Texas at San Antonio
	16:50 10m Talk		Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks NIER Track Ruofan Lu The Chinese University of Hong Kong, Yichen LI ByteDance, Yintong Huo Singapore Management University, Singapore