Advancing Code Coverage: Incorporating Program Analysis with Large Language Models (ASE 2025 - Journal-First)

Who

Chen Yang, Junjie Chen, Bin Lin, Ziqi Wang, Jianyi Zhou

Track

ASE 2025 Journal-First

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Nov 2025 11:40 - 11:50 at Vista - Test Generation, Selection & Prioritization 1 Chair(s): Owolabi Legunsen

Abstract

Automatic test generation plays a crucial role in software quality assurance by helping developers efficiently detect bugs. Search-Based Software Testing (SBST) techniques are among the most widely studied approaches, relying on heuristic search to explore the test space. While effective, SBST tools frequently fail to cover certain branches, especially those requiring scenario-specific values or deeper semantic reasoning. Recently, Large Language Models (LLMs) have shown promise in alleviating these limitations by leveraging their code comprehension abilities to generate meaningful tests. However, despite these advances, LLM-based approaches still struggle to cover hard-to-cover branches, leaving many branches to be handled manually. We define hard-to-cover branches as those that (1) require complex object construction, where valid inputs must be built through multi-step processes involving interdependent objects and specific attribute values, or (2) involve intricate inter-procedural dependencies, where the outcome of a branch condition depends on the chained execution of several methods. Existing LLM-based techniques are not equipped to handle such challenges: they achieve relatively low compilation success rates and lack sufficient semantic guidance when only given the code of the target method and limited context.

To address these issues, we propose TELPA, a novel LLM-based test generation technique enhanced by program analysis. TELPA combines lightweight program analyses with LLM prompting to systematically guide test generation toward hard-to-cover branches. First, its object construction analysis collects method invocation sequences that lead to target methods, enabling the LLM to observe how valid complex objects are constructed in practice. Second, its branch dependency analysis identifies the methods involved in branch conditions and incorporates their code in a semantically meaningful order. This allows the LLM to reason about inter-procedural dependencies instead of being overwhelmed by irrelevant surrounding code. To ensure efficiency, TELPA is triggered only when lightweight SBST tools fail to improve coverage, and it integrates a feedback loop where counter-examples—previously ineffective tests—are added to prompts to guide the LLM toward unexplored solutions.

We conducted an extensive evaluation of TELPA on 27 open-source Python projects widely used in prior studies. Results show that TELPA consistently outperforms state-of-the-art SBST (Pynguin) and LLM-based test generation tools (CODAMOSA, CHATTESTER). On average, TELPA achieves 34.10%, 25.93%, and 21.10% higher branch coverage than these baselines, respectively, under the same testing time budget. An ablation study further confirms the contribution of each major component: both object construction analysis and branch dependency analysis indeed improve branch coverage, and counter-example feedback enhances both efficiency and diversity of generated tests. These findings demonstrate that integrating program analysis with LLM prompting is a promising direction for overcoming the long-standing challenge of covering hard-to-cover branches.

Chen Yang

Tianjin University

China

Junjie Chen

Tianjin University

China

Bin Lin

Hangzhou Dianzi University

China

Ziqi Wang

Tianjin University

Jianyi Zhou

Huawei Cloud Computing Technologies Co., Ltd.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Nov
Displayed time zone: Seoul change

11:00 - 12:30	Test Generation, Selection & Prioritization 1Research Papers / Journal-First at Vista Chair(s): Owolabi Legunsen Cornell University

11:00 10m Talk		PALM: Synergizing Program Analysis and LLMs to Enhance Rust Unit Test Coverage Research Papers Bei Chu Nanjing University, Yang Feng Nanjing University, Kui Liu Huawei, Hange Shi Nanjing University, Zifan Nan Huawei, Zhaoqiang Guo Software Engineering Application Technology Lab, Huawei, China, Baowen Xu Nanjing University
11:10 10m Talk		ROR-DSE: ROR adequate test case generation using dynamic symbolic execution Journal-First Sangharatna Godboley NIT Warangal
11:20 10m Talk		Reflective Unit Test Generation for Precise Type Error Detection with Large Language Models Research Papers Chen Yang Tianjin University, Ziqi Wang Tianjin University, Yanjie Jiang Peking University, Lin Yang Tianjin University, Yuteng Zheng Tianjin University, Jianyi Zhou Huawei Cloud Computing Technologies Co., Ltd., Junjie Chen Tianjin University
11:30 10m Talk		FailMapper: Automated Generation of Unit Tests Guided by Failure Scenarios Research Papers ruiqi dong Swinburne University of Technology, Zehang Deng Swinburne University of Technology, Xiaogang Zhu Adelaide University, Xiaoning Du Monash University, Huai Liu Swinburne University of Technology, Shaohua Wang Central University of Finance and Economics, Sheng Wen Swinburne University of Technology, Yang Xiang Digital Research & Innovation Capability Platform, Swinburne University of Technology
11:40 10m Talk		Advancing Code Coverage: Incorporating Program Analysis with Large Language Models Journal-First Chen Yang Tianjin University, Junjie Chen Tianjin University, Bin Lin Hangzhou Dianzi University, Ziqi Wang Tianjin University, Jianyi Zhou Huawei Cloud Computing Technologies Co., Ltd.
11:50 10m Talk		Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models Research Papers Dianshu Liao the Australian National University, Xin Yin Zhejiang University, Shidong Pan Columbia University & New York University, Chao Ni Zhejiang University, Zhenchang Xing CSIRO's Data61, Xiaoyu Sun Australian National University, Australia Pre-print
12:00 10m Talk		Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Context Injection Research Papers Xin Yin Zhejiang University, Chao Ni Zhejiang University, Xinrui Li Zhejiang University, Liushan Chen Douyin Co., Ltd., Guojun Ma Douyin Co., Ltd., Xiaohu Yang Zhejiang University Pre-print
12:10 10m Talk		Toward Cost-Effective Adaptive Random Testing: An Approximate Nearest Neighbor Approach Journal-First Rubing Huang Macau University of Science and Technology (MUST), Chenhui Cui Macau University of Science and Technology, Junlong Lian Jiangsu University, Haibo Chen Jiangsu University, Dave Towey University of Nottingham Ningbo China, Weifeng Sun
12:20 10m Talk		Automated Combinatorial Test Generation for Alloy Research Papers Agustín Borda University of Rio Cuarto, CONICET and Guangdong Technion-Israel Institute of Technology, Germán Regis University of Rio Cuarto and CONICET, Nazareno Aguirre University of Rio Cuarto/CONICET, Argentina, and Guangdong Technion-Israel Institute of Technology, China, Marcelo F. Frias Dept. of Software Engineering Instituto Tecnológico de Buenos Aires, Pablo Ponzio Dept. of Computer Science FCEFQyN, University of Rio Cuarto