You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects (ISSTA 2025 - Research Papers)

Who

Islem BOUZENIA, Michael Pradel

Track

ISSTA 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 26 Jun 2025 11:00 - 11:25 at Cosmos 3C - Test Automation, Evolution, and API Testing Chair(s): Alexi Turcotte

Abstract

The ability to execute the test suite of a project is essential in many scenarios, e.g., to assess code quality and code coverage, to validate code changes made by developers or automated tools, and to ensure compatibility with dependencies. Despite its importance, executing the test suite of a project can be challenging in practice because different projects use different programming languages, software ecosystems, build systems, testing frameworks, and other tools. These challenges make it difficult to create a reliable, universal test execution method that works across different projects. This paper presents ExecutionAgent, an automated technique that prepares scripts for building an arbitrary project from source code and running its test cases. Inspired by the way a human developer would address this task, our approach is a large language model-based agent that autonomously executes commands and interacts with the host system. The agent uses meta-prompting to gather guidelines on the latest technologies related to the given project, and it iteratively refines its process based on feedback from the previous steps. Our evaluation applies ExecutionAgent to 50 open-source projects that use 14 different programming languages and many different build and testing tools. The approach successfully executes the test suites of 33/50 projects, while matching the test results of ground truth test suite executions with a deviation of only 7.5%. These results improve over the best previously available technique by 6.6x. The costs imposed by the approach are reasonable, with an execution time of 74 minutes and LLM costs of 0.16 dollars, on average per project. We envision ExecutionAgent to serve as a valuable tool for developers, automated programming tools, and researchers that need to execute tests across a wide variety of projects.

DOI

https://doi.org/10.1145/3728922

Islem BOUZENIA

University of Stuttgart

Michael Pradel

University of Stuttgart

Germany

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 26 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

11:00 - 12:30	Test Automation, Evolution, and API TestingResearch Papers / Tool Demonstrations at Cosmos 3C Chair(s): Alexi Turcotte CISPA

11:00 25m Talk		You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects Research Papers Islem BOUZENIA University of Stuttgart, Michael Pradel University of Stuttgart DOI
11:25 25m Talk		Effective REST APIs Testing with Error Message Analysis Research Papers Lixin Xu Nanjing University, China, Huayao Wu Nanjing University, Zhenyu Pan , Tongtong Xu Huawei, Shaohua Wang Central University of Finance and Economics, Xintao Niu Nanjing University, Changhai Nie Nanjing University DOI
11:50 25m Talk		REACCEPT: Automated Co-evolution of Production and Test Code Based on Dynamic Validation and Large Language Models Research Papers Jianlei Chi , Xiaotian Wang Harbin Engineering University, Yuhan Huang Xidian University, Lechen Yu Microsoft, Di Cui Xidian University, Jianguo Sun Xidian University, Jun Sun Singapore Management University DOI
12:15 15m Demonstration		PatchScope – A Modular Tool for Annotating and Analyzing Contributions Tool Demonstrations Jakub Narębski Nicolaus Copernicus University in Toruń, Mikołaj Fejzer Nicolaus Copernicus University in Toruń, Krzysztof Stencel University of Warsaw, Piotr Przymus Nicolaus Copernicus University in Toruń, Poland Link to publication DOI

Information for Participants

Thu 26 Jun 2025 11:00 - 12:30 at Cosmos 3C - Test Automation, Evolution, and API Testing Chair(s): Alexi Turcotte

Info for room Cosmos 3C:

Cosmos 3C is the third room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.