Automated Repair of Ambiguous Problem Descriptions for LLM-Based Code Generation (ASE 2025 - Research Papers)

Who

Haoxiang Jia, Robbie Morris, He Ye, Federica Sarro, Sergey Mechtaev

Track

ASE 2025 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Nov 2025 11:00 - 11:10 at Grand Hall 1 - Program Repair 2

Abstract

The widespread adoption of large language models (LLMs) in software engineering has amplified the role of natural language (NL). However, the inherent ambiguity of NL threatens software quality, because ambiguous requirements may lead to faulty program generation. The complexity of ambiguity detection and resolution motivates us to introduce the problem of automated repair of ambiguous NL requirements, which we approach by reducing code generation uncertainty and aligning NL with input-output examples.

Repairing ambiguity in requirements is a difficult challenge for LLMs, as it demands metacognition — the model must understand how its own interpretation changes when the text is altered. Our experiments show that directly prompting an LLM to detect and resolve ambiguities results in irrelevant or inconsistent clarifications. The key novelty we propose is a method of decomposing this problem into simpler sub-problems that do not require metacognitive reasoning. First, we analyze and repair the LLM’s interpretation of requirements embodied by the distribution of programs they induce using traditional testing and program repair methods. Second, we repair requirements based on the changes to the distribution via what we refer to as contrastive specification inference. This decomposition enables targeted, minimal requirement repairs that yield cross-model performance gains in code generation.

This approach is implemented as the tool SpecFix, and evaluated using three state‐of‐the‐art LLMs, GPT-4o, DeepSeek-V3 and Qwen2.5-Coder-32b-Instruct, across two widely used code generation benchmarks: HumanEval+ and MBPP+. Our results show that SpecFix, operating autonomously without human intervention or external information, modifies 23.93% of the requirements, leading to a 33.66% improvement in model Pass@1 on the modified requirements. Across the entire benchmark, this corresponds to an absolute increase of 4.3% in overall Pass@1.

Haoxiang Jia

Peking University

Robbie Morris

University College London

He Ye

University College London (UCL)

United Kingdom

Federica Sarro