You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects
The ability to execute the test suite of a project is essential in many scenarios, e.g., to assess code quality and code coverage, to validate code changes made by developers or automated tools, and to ensure compatibility with dependencies. Despite its importance, executing the test suite of a project can be challenging in practice because different projects use different programming languages, software ecosystems, build systems, testing frameworks, and other tools. These challenges make it difficult to create a reliable, universal test execution method that works across different projects. This paper presents ExecutionAgent, an automated technique that prepares scripts for building an arbitrary project from source code and running its test cases. Inspired by the way a human developer would address this task, our approach is a large language model-based agent that autonomously executes commands and interacts with the host system. The agent uses meta-prompting to gather guidelines on the latest technologies related to the given project, and it iteratively refines its process based on feedback from the previous steps. Our evaluation applies ExecutionAgent to 50 open-source projects that use 14 different programming languages and many different build and testing tools. The approach successfully executes the test suites of 33/50 projects, while matching the test results of ground truth test suite executions with a deviation of only 7.5%. These results improve over the best previously available technique by 6.6x. The costs imposed by the approach are reasonable, with an execution time of 74 minutes and LLM costs of 0.16 dollars, on average per project. We envision ExecutionAgent to serve as a valuable tool for developers, automated programming tools, and researchers that need to execute tests across a wide variety of projects.
Thu 26 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:00 - 12:30 | Test Automation, Evolution, and API TestingResearch Papers / Tool Demonstrations at Cosmos 3C Chair(s): Alexi Turcotte CISPA | ||
11:00 25mTalk | You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects Research Papers DOI | ||
11:25 25mTalk | Effective REST APIs Testing with Error Message Analysis Research Papers Lixin Xu Nanjing University, China, Huayao Wu Nanjing University, Zhenyu Pan , Tongtong Xu Huawei, Shaohua Wang Central University of Finance and Economics, Xintao Niu Nanjing University, Changhai Nie Nanjing University DOI | ||
11:50 25mTalk | REACCEPT: Automated Co-evolution of Production and Test Code Based on Dynamic Validation and Large Language Models Research Papers Jianlei Chi , Xiaotian Wang Harbin Engineering University, Yuhan Huang Xidian University, Lechen Yu Microsoft, Di Cui Xidian University, Jianguo Sun Xidian University, Jun Sun Singapore Management University DOI | ||
12:15 15mDemonstration | PatchScope – A Modular Tool for Annotating and Analyzing Contributions Tool Demonstrations Jakub Narębski Nicolaus Copernicus University in Toruń, Mikołaj Fejzer Nicolaus Copernicus University in Toruń, Krzysztof Stencel University of Warsaw, Piotr Przymus Nicolaus Copernicus University in Toruń, Poland Link to publication DOI |
Cosmos 3C is the third room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.