ASE 2024
Sun 27 October - Fri 1 November 2024 Sacramento, California, United States

Software correctness is crucial, with unit testing playing an indispensable role in the software development lifecycle. However, creating unit tests is time-consuming and costly, underlining the need for automation. Leveraging Large Language Models (LLMs) for unit test generation is a promising solution, but existing studies focus on simple, small-scale scenarios, leaving a gap in understanding LLMs’ performance in real-world applications, particularly regarding integration and assessment efficacy at scale. Here, we present \textsc{AgoneTest}, a system focused on automatically generating and evaluating complex class-level test suites. Our contributions include a scalable automated system, a newly developed dataset for rigorous evaluation, and a detailed methodology for test quality assessment.