ICSE 2026 (series) / Posters /
Evaluating LLM-Based Test Refactoring with a Behavior-Preserving Composite Metric
Wed 15 Apr 2026 13:30 - 14:00 at Catering and Exhibition Hall (Europa I to IV) - Poster Session 1 Chair(s): Rohit Gheyi, Grischa Liebel
Thu 16 Apr 2026 15:30 - 16:00 at Catering and Exhibition Hall (Europa I to IV) - Poster Session 4 Chair(s): Rohit Gheyi, Grischa Liebel
Thu 16 Apr 2026 15:30 - 16:00 at Catering and Exhibition Hall (Europa I to IV) - Poster Session 4 Chair(s): Rohit Gheyi, Grischa Liebel
Large Language Models (LLMs) are increasingly used to refactor automatically generated unit tests, which often achieve high coverage but suffer from poor readability and maintainability. Evaluating such refactorings remains challenging, as existing metrics either penalize beneficial edits or overlook readability. We introduce CTSES, a composite metric combining CodeBLEU, METEOR, and ROUGE-L to balance semantics, readability, and structure. Experiments on 5,000+ refactorings from Defects4J and SF110 show that CTSES reduces false negatives and provides more interpretable evaluation.
Wed 15 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
Wed 15 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change
Thu 16 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
Thu 16 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change