TCSE logo 
 Sigsoft logo
Sustainability badge
Fri 2 May 2025 15:15 - 15:30 at 214 - AI for Testing and QA 6 Chair(s): Ladan Tahvildari

Large Language Models (LLMs) have shown significant potential in automating software engineering tasks, particularly in code generation. However, current evaluation benchmarks, which primarily focus on accuracy, fall short in assessing the quality of the code generated by these models, specifically their tendency to produce code smells. To address this limitation, we introduce CodeSmellEval, a benchmark designed to evaluate the propensity of LLMs for generating code smells. Our benchmark includes a novel metric: Propensity Smelly Score (PSC), and a curated dataset of method-level code smells: CodeSmellData. To demonstrate the use of CodeSmellEval, we conducted a case study with two state-of-the-art LLMs, CodeLlama and Mistral. The results reveal that both models tend to generate code smells, such as simplifiable-condition and consider-merging-isinstance. These findings highlight the effectiveness of our benchmark in evaluating LLMs, providing valuable insights into their reliability and their propensity to introduce code smells in code generation tasks.

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
AI for Testing and QA 6Journal-first Papers / Research Track / New Ideas and Emerging Results (NIER) at 214
Chair(s): Ladan Tahvildari University of Waterloo
14:00
15m
Talk
Treefix: Enabling Execution with a Tree of PrefixesArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Beatriz Souza Universität Stuttgart, Michael Pradel University of Stuttgart
Pre-print
14:15
15m
Talk
Assessing Evaluation Metrics for Neural Test Oracle Generation
Journal-first Papers
Jiho Shin York University, Hadi Hemmati York University, Moshi Wei York University, Song Wang York University
14:30
15m
Talk
Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement
Journal-first Papers
Saurabhsingh Rajput Dalhousie University, Tim Widmayer University College London (UCL), Ziyuan Shang Nanyang Technological University, Maria Kechagia National and Kapodistrian University of Athens, Federica Sarro University College London, Tushar Sharma Dalhousie University
14:45
15m
Talk
Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality
Journal-first Papers
Hao Li Queen's University, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Cor-Paul Bezemer University of Alberta
Link to publication DOI Pre-print
15:00
15m
Talk
Evaluating the Generalizability of LLMs in Automated Program Repair
New Ideas and Emerging Results (NIER)
Fengjie Li Tianjin University, Jiajun Jiang Tianjin University, Jiajun Sun Tianjin University, Hongyu Zhang Chongqing University
Pre-print
15:15
15m
Talk
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study
New Ideas and Emerging Results (NIER)
Alejandro Velasco William & Mary, Daniel Rodriguez-Cardenas William & Mary, David Nader Palacio William & Mary, Lutfar Rahman Alif University of Dhaka, Denys Poshyvanyk William & Mary
Pre-print
:
:
:
: