AgoneTest: Automated creation and assessment of Unit tests leveraging Large Language Models
Software correctness is crucial, with unit testing playing an indispensable role in the software development lifecycle. However, creating unit tests is time-consuming and costly, underlining the need for automation. Leveraging Large Language Models (LLMs) for unit test generation is a promising solution, but existing studies focus on simple, small-scale scenarios, leaving a gap in understanding LLMs’ performance in real-world applications, particularly regarding integration and assessment efficacy at scale. Here, we present \textsc{AgoneTest}, a system focused on automatically generating and evaluating complex class-level test suites. Our contributions include a scalable automated system, a newly developed dataset for rigorous evaluation, and a detailed methodology for test quality assessment.