TCSE logo 
 Sigsoft logo
Sustainability badge
Sat 3 May 2025 16:30 - 17:00 at 213 - Paper Presentation 3 Chair(s): Matteo Biagiola

Software testing is a crucial but time-consuming aspect of software development, and recently, Large Language Models (LLMs) have gained popularity for automated test case generation. However, because LLMs are trained on vast amounts of open-source code, they often generate test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we propose Reinforcement Learning from Static Quality Metrics (RLSQM), wherein we utilize Reinforcement Learning to generate high-quality unit tests based on static analysis-based quality metrics. First, we analyzed LLM-generated tests and show that LLMs frequently do generate undesirable test smells — up to 37% of the time. Then, we implemented lightweight static analysis-based reward model and trained LLMs using this reward model to optimize for five code quality metrics. Our experimental results demonstrate that the RL-optimized Codex model consistently generated higher-quality test cases than the base LLM, improving quality metrics by up to 23%, and generated nearly 100% syntactically-correct code. RLSQM also outperformed GPT-4 on all code quality metrics, in spite of training a substantially cheaper Codex model. We provide insights into how reliably utilize RL to improve test generation quality and show that RLSQM is a significant step towards enhancing the overall efficiency and reliability of automated software testing.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
Paper Presentation 3DeepTest at 213
Chair(s): Matteo Biagiola Università della Svizzera italiana
16:00
30m
Talk
OpenCat: Improving Interoperability of ADS Testing
DeepTest
Qurban Ali University of Milano-Bicocca, Andrea Stocco Technical University of Munich, fortiss, Leonardo Mariani University of Milano-Bicocca, Oliviero Riganelli University of Milano - Bicocca
Pre-print
16:30
30m
Talk
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
DeepTest
Benjamin Steenhoek Microsoft, Michele Tufano Google, Neel Sundaresan Microsoft, Alexey Svyatkovskiy Google DeepMind
:
:
:
: