ConTested: Consistency-Aided Tested Code Generation with LLM
Recent advancements in large language models (LLMs) have significantly improved code generation, which generates code snippets automatically based on natural language requirements. Despite achieving state-of-the-art performance, LLMs often struggle to generate accurate and reliable code, requiring developers to spend substantial effort debugging and evaluating the generated output. Researchers have proposed leveraging Consistency to select code that passes more tests (inter-consistency) and demonstrates consistent behavior across more counterparts (intra-consistency). However, since the tests themselves are also generated by LLMs, relying on majority voting based on incorrect tests leads to unreliable results. To address this, we propose a lightweight interaction framework that incorporates user feedback to effectively guide consistency. Our results demonstrate that, with minimal human effort, performance can be significantly improved. In each iteration, we introduce a rank-correct-fix co-evolution process between code and tests. This process iteratively enhances the quality of both, making the consistency voting between code and tests more reliable.
We evaluate ConTested through extensive experiments, demonstrating its effectiveness across multiple LLMs, including GPT-3.5 and GPT-4o. Our results show improvements of 32.9% over GPT-3.5 and 16.97% over GPT-4o. Additionally, ConTested achieves an 11.1% improvement over the SOTA post-processing technique, MPSC. This improvement is achieved with only a 4-round interaction with users, requiring minimal user effort. A user study further confirms the feasibility and cost-effectiveness of ConTested, highlighting its ability to enhance code generation without introducing substantial overhead.
Thu 26 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
16:00 - 17:15 | Code Generation with LLMsResearch Papers at Cosmos 3C Chair(s): Yutian Tang University of Glasgow, United Kingdom | ||
16:00 25mTalk | OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Research Papers Lianghong Guo Sun Yat-sen University, Wei Tao Independent Researcher, Runhan Jiang Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Xilin Liu Huawei Cloud, Yuchi Ma Huawei Cloud Computing Technologies, Mingzhi Mao Sun Yat-sen University, Hongyu Zhang Chongqing University, Zibin Zheng Sun Yat-sen University DOI | ||
16:25 25mTalk | ConTested: Consistency-Aided Tested Code Generation with LLM Research Papers Jinhao Dong Peking University, Jun Sun Singapore Management University, Wenjie Zhang National University of Singapore, Jin Song Dong National University of Singapore, Dan Hao Peking University DOI Pre-print | ||
16:50 25mTalk | Causality-Aided Evaluation and Explanation of Large Language Model-based Code Generation Research Papers Zhenlan Ji The Hong Kong University of Science and Technology, Pingchuan Ma HKUST, Li Zongjie Hong Kong University of Science and Technology, Zhaoyu Wang HKUST, Shuai Wang Hong Kong University of Science and Technology DOI |
Cosmos 3C is the third room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.