SANER 2025
Tue 4 - Fri 7 March 2025 Montréal, Québec, Canada
Fri 7 Mar 2025 11:00 - 11:15 at M-1410 - Code Quality and Refactoring Chair(s): Wesley Assunção

In recent years, AI-based software engineering has progressed from pre-trained models to advanced agentic workflows, with Software Development Agents representing the next major leap. These agents, capable of reasoning, planning, and interacting with external environments, offer promising solutions to complex software engineering tasks. However, while much research has evaluated code generated by large language models (LLMs), comprehensive studies on agent-generated patches, particularly in real-world settings, are lacking. This study addresses that gap by evaluating 4,892 patches from 10 top-ranked agents on 500 real-world GitHub issues from SWE-Bench Verified, focusing on their impact on code quality. Our analysis shows no single agent dominated, with 170 issues unresolved, indicating room for improvement. Even for patches that passed unit tests and resolved issues, agents made different file and function modifications compared to the gold patches from repository developers, revealing limitations in the benchmark’s test case coverage. Most agents maintained code reliability and security, avoiding new bugs or vulnerabilities; while some agents increased code complexity, many reduced code duplication and minimized code smells. Finally, agents performed better on simpler codebases, suggesting that breaking complex tasks into smaller sub-tasks could improve effectiveness. This study provides the first comprehensive evaluation of agent-generated patches on real-world GitHub issues, offering insights to advance AI-driven software development.

Fri 7 Mar

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
Code Quality and RefactoringResearch Papers / Reproducibility Studies and Negative Results (RENE) Track at M-1410
Chair(s): Wesley Assunção North Carolina State University
11:00
15m
Talk
Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios
Research Papers
Zhi Chen Singapore Management University, Lingxiao Jiang Singapore Management University
Pre-print
11:15
15m
Talk
Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects
Research Papers
Henrique Gomes Nunes Federal University of Minas Gerais, Eduardo Figueiredo Federal University of Minas Gerais, Larissa Rocha State University of Bahia, Sarah Nadi New York University Abu Dhabi, Fischer Ferreira Federal University of Ceará, Geanderson Esteves dos Santos Federal University of Minas Gerais
11:30
15m
Talk
Exploring the Potential of Llama Models in Automated Code Refinement: A Replication Study
Research Papers
Genevieve Caumartin Concordia University, Qiaolin Qin Polytechnique Montréal, Heng Li Polytechnique Montréal, Diego Costa Concordia University, Canada
Pre-print
11:45
15m
Talk
Exploring the Relationship between Technical Debt and Lead Time: An Industrial Case Study
Reproducibility Studies and Negative Results (RENE) Track
Bhuwan Paudel Blekinge Institute of Technology, Javier Gonzalez-Huerta Blekinge Institute of Technology, Ehsan Zabardast Nordea, Blekinge Institute of Technology, Eriks Klotins Blekinge Institute of Technology