ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil
Wed 15 Apr 2026 11:15 - 11:30 at Oceania I - Testing and Analysis 3 Chair(s): Yvonne Dittrich

Raising test coverage at industry scale is difficult: engineers report a median of four minutes to produce a single covered line of code, making coverage targets expensive in large repositories. We present AutoCover, a production system that automatically generates, validates, and proposes tests using large language models. AutoCover supports three complementary interaction modes: a CLI for local scripting, a Headless mode for generating tests at scale across repository shards and creating merge requests, and an IDE mode for human-in-the-loop generation that captures developer intent and enables rapid fixes. Together, these modes support both legacy code in repositories and newly developed code on developer machines.

AutoCover is implemented as a modular, agentic pipeline built on LangGraph, with subgraphs for preparation, generation, execution, and validation or repair. Preparation includes scaffolding tests and running initial coverage, while generation uses multi-shot prompting informed by prior failures. Repository adapters provide language- and repository-specific actions such as import splicing, coverage commands, and coding conventions, and a code-context retriever supplies only relevant symbols to the LLM to stay within context limits. To reduce low-quality outputs, AutoCover combines intent-aware generation with validation gates such as coverage deltas and mutation or branch checks where available, along with flakiness defenses like multi-run CI simulation when compute permits.

This paper makes three contributions. First, it describes AutoCover’s end-to-end architecture and evolution, including design considerations around user experience, test quality, and cost. Second, it details interaction patterns across CLI, Headless, and IDE modes, and shows how intent collection and human-in-the-loop repair improve acceptance. Third, it reports both intrinsic and extrinsic evaluation results. AutoCover now generates about 11% of all new tests that are reviewed and added to CompanyX’s codebase.

Wed 15 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30
Testing and Analysis 3SE In Practice (SEIP) / Research Track at Oceania I
Chair(s): Yvonne Dittrich IT University of Copenhagen
11:00
15m
Talk
TestifAI: Tomography-Based Testing for Deep Learning Systems
Research Track
Arooj Arif Northeastern University London, Tobias Hartung Northeastern University London, Elena Botoeva University of Kent, Alexandros Koliousis Northeastern University London
11:15
15m
Talk
Automated Software Test Generation at Industry Scale Using a Multi-Agent Architecture and Workflow IntegrationVirtual AttendanceDistinguished Paper Award
SE In Practice (SEIP)
Matas Rastenis Uber Technologies Inc., Ben Chou Uber Technologies Inc., Shauvik Roy Choudhary Uber Technologies, Inc, René Just University of Washington
Media Attached
11:30
15m
Talk
On the Flakiness of LLM-generated Tests for Industrial and Open-Source Database Management Systems
SE In Practice (SEIP)
Alexander Berndt Heidelberg University, Thomas Bach SAP, Rainer Gemulla University of Mannheim, Marcus Kessel University of Mannheim, Sebastian Baltes Heidelberg University
Pre-print
11:45
15m
Talk
Enabling Black-box RPC-API Testing with Multi-Agent Reinforcement Learning and LLMs: An Industry Case Study
SE In Practice (SEIP)
Xiaoqing Sun Alibaba Cloud, Zhou Shao Alibaba Cloud, Xiaonan Shi Alibaba Cloud, Shiliang Xiao Alibaba Cloud, Chao Ma Alibaba Cloud, Xiaobo Xue Alibaba Cloud, Jianyuan Lu Alibaba Cloud, Shize Zhang Alibaba Cloud, Enge Song Alibaba Cloud, Song Yang Alibaba Cloud, Xing Li Zhejiang University and Alibaba Cloud, Chongrong Fang Shanghai Jiao Tong University, Chunrong Fang Nanjing University, Biao Lyu Alibaba Cloud, Shunmin Zhu Hangzhou Feitian Cloud and Alibaba Cloud
12:00
15m
Talk
Hamster: A Large-Scale Study and Characterization of Developer-Written Tests
SE In Practice (SEIP)
Rangeet Pan IBM Research, Tyler Stennett Georgia Institute of Technology, Raju Pavuluri IBM T.J. Watson Research Center, Nate Levin Georgia Institute of Technology, Alessandro Orso University of Georgia, USA, Saurabh Sinha IBM Research
12:15
15m
Talk
AutoOracle: High-Quality C++ Test Oracle Generation via Data Quality-Driven and Filtering-Enabled LLMs
SE In Practice (SEIP)
Cong Li Samsung R&D Institute China Xi'an, Samsung Electronics, Jong-In Jang Samsung Electronics, Yuqi Zhang Samsung R&D Institute China Xi'an, Samsung Electronics, Nakwon Lee Samsung Electronics, Bin Wang , Yinghua Zhang Samsung R&D Institute China Xi'an, Samsung Electronics, Chanwook Kim Samsung Electronics, Jia Zhang Samsung R&D Institute China Xi'an, Samsung Electronics, HyunSeok Kim Samsung Electronics, Xing He Samsung R&D Institute China Xi'an, Samsung Electronics, Kangho Roh Samsung Electronics, Seongjun Ahn Samsung Electronics