FSE 2026
Sun 5 - Thu 9 July 2026 Montreal, Canada

This program is tentative and subject to change.

Tue 7 Jul 2026 15:10 - 15:30 at MB 3.435 - Test generation 1

Automatically generating bug reproduction tests (BRT) from issue descriptions is crucial for facilitating software maintenance. Large Language Model (LLM)-based approaches have shown great potential for this task. Their effectiveness heavily relies on retrieving high-quality context from the codebase. The retrieval phase of existing approaches relies on either traditional methods like BM25 or modern LLM-driven strategies. The LLM-based retrieval strategies typically involve equipping an LLM with tools to autonomously explore the code repository or having it select the most relevant files and code snippets from a provided list as context. However, these retrieval methods suffer from three key limitations: (1) They often employ a unified strategy for retrieving both source code and test cases, overlooking their distinct retrieval requirements. (2) They focus solely on semantic similarity, ignoring function call relationships that reflect behavioral relevance, which often leads to the retrieval of irrelevant context. (3) The retrieval lacks a feedback loop from the generation phase, preventing it from refining the context based on execution results. These limitations collectively result in low-quality context, thereby hindering the accuracy of bug reproduction. To address these challenges, we propose iCoRe, an iterative, correlation-aware context retrieval approach. iCoRe is explicitly designed to be aware of three key correlations: 1) the correlation between source code and test cases, which requires differentiated retrieval, 2) the correlation between textual semantics and function call structures for accurate relevance assessment, and 3) the correlation between the retrieval and generation phases, which enables iterative feedback and refinement. To evaluate iCoRe, we integrate it with an LLM-based BRT generator and conduct a comprehensive evaluation on the SWT-bench Lite benchmark. Experimental results show that our method achieves a Fail-to-Pass rate of 42.0%, representing a significant 31.7% relative improvement over existing retrieval methods.

This program is tentative and subject to change.

Tue 7 Jul

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
14:00
10m
Talk
TestAgent: A Multi-Agent LLM Framework for Repository-Level Unit Test Generation
Tool Demonstrations
ye shang Nanjing University, Quanjun Zhang Nanjing University of Science and Technology, Zhengyu Zhan Nanjing University, Ke Huang Nanjing University, Chunrong Fang Nanjing University, Zhenyu Chen Nanjing University
14:10
20m
Talk
Just-in-Time Catching Test Generation at Meta
Industry Papers
Mark Harman Meta Platforms, Inc. and UCL, Matthew Becker Meta, Yifei Chen Meta, Nicholas Cochran Meta, Pouyan Ghasemi Meta, Abhishek Gulati Meta platforms, Mehrdad Honarkhah Meta, Hervé Robert Meta platforms, Jiacheng Liu Meta, Weini Liu Meta, Sreeja Thummala Meta, Xiaoning Yang Meta, Rui Xin Meta, Sophie Zeng Meta, Zac Haluza Meta
14:30
20m
Talk
Understanding and Mitigating Hallucinations in Industrial LLM-based Unit Test Generation
Industry Papers
Yanlun Tu Ant Group, Ziyue Zhou University of Electronic Science and Technology of China, Cheng Xu Ant Group, Jingling Sun University of Electronic Science and Technology of China, Shuai Feng Ant Group, Chengyu Zhang Loughborough University
14:50
20m
Talk
Directed Grammar-Based Test Generation
Journal-First Paper
Lukas Kirschner Saarland University, Ezekiel Soremekun Singapore University of Technology and Design
15:10
20m
Talk
iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation
Research Papers
JunyiWang Zhejiang University, Jialun Cao Hong Kong University of Science and Technology, Zhongxin Liu Zhejiang University