ConTested: Consistency-Aided Tested Code Generation with LLM (ISSTA 2025 - Research Papers)

Who

Jinhao Dong, Jun Sun, Wenjie Zhang, Jin Song Dong, Dan Hao

Track

ISSTA 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 26 Jun 2025 16:25 - 16:50 at Cosmos 3C - Code Generation with LLMs Chair(s): Yutian Tang

Abstract

Recent advancements in large language models (LLMs) have significantly improved code generation, which generates code snippets automatically based on natural language requirements. Despite achieving state-of-the-art performance, LLMs often struggle to generate accurate and reliable code, requiring developers to spend substantial effort debugging and evaluating the generated output. Researchers have proposed leveraging Consistency to select code that passes more tests (inter-consistency) and demonstrates consistent behavior across more counterparts (intra-consistency). However, since the tests themselves are also generated by LLMs, relying on majority voting based on incorrect tests leads to unreliable results. To address this, we propose a lightweight interaction framework that incorporates user feedback to effectively guide consistency. Our results demonstrate that, with minimal human effort, performance can be significantly improved. In each iteration, we introduce a rank-correct-fix co-evolution process between code and tests. This process iteratively enhances the quality of both, making the consistency voting between code and tests more reliable.

We evaluate ConTested through extensive experiments, demonstrating its effectiveness across multiple LLMs, including GPT-3.5 and GPT-4o. Our results show improvements of 32.9% over GPT-3.5 and 16.97% over GPT-4o. Additionally, ConTested achieves an 11.1% improvement over the SOTA post-processing technique, MPSC. This improvement is achieved with only a 4-round interaction with users, requiring minimal user effort. A user study further confirms the feasibility and cost-effectiveness of ConTested, highlighting its ability to enhance code generation without introducing substantial overhead.

Link to Preprint

https://arxiv.org/pdf/2411.15587

DOI

https://doi.org/10.1145/3728902

Jinhao Dong

Peking University

China

Jun Sun

Singapore Management University

Singapore

Wenjie Zhang

National University of Singapore

Jin Song Dong

National University of Singapore

Singapore

Dan Hao

Peking University

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 26 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 17:15	Code Generation with LLMsResearch Papers at Cosmos 3C Chair(s): Yutian Tang University of Glasgow, United Kingdom

16:00 25m Talk		OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Research Papers Lianghong Guo Sun Yat-sen University, Wei Tao Independent Researcher, Runhan Jiang Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Xilin Liu Huawei Cloud, Yuchi Ma Huawei Cloud Computing Technologies, Mingzhi Mao Sun Yat-sen University, Hongyu Zhang Chongqing University, Zibin Zheng Sun Yat-sen University DOI
16:25 25m Talk		ConTested: Consistency-Aided Tested Code Generation with LLM Research Papers Jinhao Dong Peking University, Jun Sun Singapore Management University, Wenjie Zhang National University of Singapore, Jin Song Dong National University of Singapore, Dan Hao Peking University DOI Pre-print
16:50 25m Talk		Causality-Aided Evaluation and Explanation of Large Language Model-based Code Generation Research Papers Zhenlan Ji The Hong Kong University of Science and Technology, Pingchuan Ma HKUST, Li Zongjie Hong Kong University of Science and Technology, Zhaoyu Wang HKUST, Shuai Wang Hong Kong University of Science and Technology DOI

Information for Participants

Thu 26 Jun 2025 16:00 - 17:15 at Cosmos 3C - Code Generation with LLMs Chair(s): Yutian Tang

Info for room Cosmos 3C:

Cosmos 3C is the third room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.