Advancing Code Coverage: Incorporating Program Analysis with Large Language Models
This program is tentative and subject to change.
Automatic test generation plays a crucial role in software quality assurance by helping developers efficiently detect bugs. Search-Based Software Testing (SBST) techniques are among the most widely studied approaches, relying on heuristic search to explore the test space. While effective, SBST tools frequently fail to cover certain branches, especially those requiring scenario-specific values or deeper semantic reasoning. Recently, Large Language Models (LLMs) have shown promise in alleviating these limitations by leveraging their code comprehension abilities to generate meaningful tests. However, despite these advances, LLM-based approaches still struggle to cover hard-to-cover branches, leaving many branches to be handled manually. We define hard-to-cover branches as those that (1) require complex object construction, where valid inputs must be built through multi-step processes involving interdependent objects and specific attribute values, or (2) involve intricate inter-procedural dependencies, where the outcome of a branch condition depends on the chained execution of several methods. Existing LLM-based techniques are not equipped to handle such challenges: they achieve relatively low compilation success rates and lack sufficient semantic guidance when only given the code of the target method and limited context.
To address these issues, we propose TELPA, a novel LLM-based test generation technique enhanced by program analysis. TELPA combines lightweight program analyses with LLM prompting to systematically guide test generation toward hard-to-cover branches. First, its object construction analysis collects method invocation sequences that lead to target methods, enabling the LLM to observe how valid complex objects are constructed in practice. Second, its branch dependency analysis identifies the methods involved in branch conditions and incorporates their code in a semantically meaningful order. This allows the LLM to reason about inter-procedural dependencies instead of being overwhelmed by irrelevant surrounding code. To ensure efficiency, TELPA is triggered only when lightweight SBST tools fail to improve coverage, and it integrates a feedback loop where counter-examples—previously ineffective tests—are added to prompts to guide the LLM toward unexplored solutions.
We conducted an extensive evaluation of TELPA on 27 open-source Python projects widely used in prior studies. Results show that TELPA consistently outperforms state-of-the-art SBST (Pynguin) and LLM-based test generation tools (CODAMOSA, CHATTESTER). On average, TELPA achieves 34.10%, 25.93%, and 21.10% higher branch coverage than these baselines, respectively, under the same testing time budget. An ablation study further confirms the contribution of each major component: both object construction analysis and branch dependency analysis indeed improve branch coverage, and counter-example feedback enhances both efficiency and diversity of generated tests. These findings demonstrate that integrating program analysis with LLM prompting is a promising direction for overcoming the long-standing challenge of covering hard-to-cover branches.
This program is tentative and subject to change.
Wed 19 NovDisplayed time zone: Seoul change
11:00 - 12:30 | |||
11:00 10mTalk | PALM: Synergizing Program Analysis and LLMs to Enhance Rust Unit Test Coverage Research Papers | ||
11:10 10mTalk | ROR-DSE: ROR adequate test case generation using dynamic symbolic execution Journal-First Track Sangharatna Godboley NIT Warangal | ||
11:20 10mTalk | Reflective Unit Test Generation for Precise Type Error Detection with Large Language Models Research Papers Chen Yang Tianjin University, Ziqi Wang Tianjin University, Yanjie Jiang Peking University, Lin Yang Tianjin University, Yuteng Zheng Tianjin University, Jianyi Zhou Huawei Cloud Computing Technologies Co., Ltd., Junjie Chen Tianjin University | ||
11:30 10mTalk | FailMapper: Automated Generation of Unit Tests Guided by Failure Scenarios Research Papers ruiqi dong Swinburne University of Technology, Zehang Deng Swinburne University of Technology, Xiaogang Zhu The University of Adelaide, Xiaoning Du Monash University, Huai Liu Swinburne University of Technology, Shaohua Wang Central University of Finance and Economics, Sheng Wen Swinburne University of Technology, Yang Xiang Digital Research & Innovation Capability Platform, Swinburne University of Technology | ||
11:40 10mTalk | Advancing Code Coverage: Incorporating Program Analysis with Large Language Models Journal-First Track Chen Yang Tianjin University, Junjie Chen Tianjin University, Bin Lin Hangzhou Dianzi University, Ziqi Wang Tianjin University, Jianyi Zhou Huawei Cloud Computing Technologies Co., Ltd. | ||
11:50 10mTalk | Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models Research Papers Dianshu Liao the Australian National University, Xin Yin Zhejiang University, Shidong Pan Columbia University & New York University, Chao Ni Zhejiang University, Zhenchang Xing CSIRO's Data61, Xiaoyu Sun Australian National University, Australia Pre-print | ||
12:00 10mTalk | Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Context Injection Research Papers Xin Yin Zhejiang University, Chao Ni Zhejiang University, Xinrui Li School of Software Technology, Zhejiang University, Liushan Chen Douyin Co., Ltd., Guojun Ma Douyin Co., Ltd., Xiaohu Yang Zhejiang University Pre-print | ||
12:10 10mTalk | Toward Cost-Effective Adaptive Random Testing: An Approximate Nearest Neighbor Approach Journal-First Track Rubing Huang Macau University of Science and Technology (M.U.S.T.), Chenhui Cui Macau University of Science and Technology, Junlong Lian Jiangsu University, Haibo Chen Jiangsu University, Dave Towey University of Nottingham Ningbo China, Weifeng Sun | ||
12:20 10mTalk | Automated Combinatorial Test Generation for Alloy Research Papers Agustín Borda Dept. of Computer Science FCEFQyN, University of Rio Cuarto, Germán Regis University of Rio Cuarto and CONICET, Nazareno Aguirre University of Rio Cuarto/CONICET, Argentina, and Guangdong Technion-Israel Institute of Technology, China, Marcelo F. Frias Dept. of Software Engineering Instituto Tecnológico de Buenos Aires, Pablo Ponzio Dept. of Computer Science FCEFQyN, University of Rio Cuarto | ||