To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study
LLM-based autonomous coding agents have reshaped software development. While these agents show exceptional capability in code generation, open questions persist about the long-term maintainability of AI-generated code. This study empirically investigates the maintenance extent, human involvement, and modification types of AI-generated files compared to human-authored code. Using the AIDev dataset, which collects pull requests generated by AI, and GitHub, we analyzed over 1,000 files and approximately 3,200 changes from 100 popular repositories. Our findings show that: (i) AI-generated files receive relatively less frequent maintenance compared to human-authored code, with updates affecting only a small fraction of the total file size; (ii) the most frequent modifications to AI code are feature extensions, whereas human code updates focus more on bug fixes. These findings suggest that while AI agents produce code with sufficient initial quality, sustained human involvement is necessary; and (iii) human developers perform the large majority of this maintenance (approximately 83%).
Wed 10 JunDisplayed time zone: London change
11:00 - 12:30 | LLMs for SE (Coding) 1Short Papers and Emerging Results / Posters and Vision / Industry Papers / AI Models / Data / Research Papers at JMS 743 Chair(s): Muhammad Waseem Faculty of Information Technology and Communication Sciences, Tampere University, 33014 Tampere, Finland | ||
11:00 15mTalk | OpenClassGen: A Large-Scale Corpus of Real-World Python Classes for LLM Research AI Models / Data Musfiqur Rahman Concordia University, Montreal, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University DOI Pre-print | ||
11:15 10mTalk | Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study Industry Papers Sivajeet Chand Technical University of Munich, Kevin Nguyen BMW Group, Peter Kuntz BMW Group, Alexander Pretschner TU Munich Pre-print | ||
11:25 15mTalk | Bridging the Programming Language Gap: Constructing a Multilingual Shared Semantic Space through AST Unification and Graph Matching Research Papers Junhao Chen Nanjing University of Aeronautics and Astronautics, Jingxuan Zhang Nanjing University of Aeronautics and Astronautics, Jian He Shanghai Aerospace Electronic Technology Institute, Yixuan Tang Nanjing University of Aeronautics and Astronautics, Weiqin Zou Nanjing University of Aeronautics and Astronautics Pre-print | ||
11:40 15mTalk | SelfHeal: Empirical Fix Pattern Analysis and Bug Repair in LLM Agents Research Papers Niful Islam Oakland University, Muhammad Anas Raza Oakland University, Mohammad Wardat Oakland University, USA Pre-print | ||
11:55 10mTalk | Quo Vadis, Code Review? Exploring the Future of Code Review Short Papers and Emerging Results Michael Dorner Technische Hochschule Nürnberg Georg Simon Ohm, Andreas Bauer Technische Hochschule Nürnberg Georg Simon Ohm, Darja Šmite Blekinge Institute of Technology, Lukas Thode Blekinge Institute of Technology, Daniel Mendez Blekinge Institute of Technology and fortiss, Ricardo Britto Ericsson / Blekinge Institute of Technology, Stephan Lukasczyk JetBrains Research, Ehsan Zabardast Nordea / Blekinge Institute of Technology, Michael Kormann SAP Pre-print | ||
12:05 10mTalk | To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study Short Papers and Emerging Results Shota Sawada National Institute of Technology (KOSEN), Nara College, Tatsuya Shirai Nara Institute of Science and Technology, Yutaro Kashiwa Nara Institute of Science and Technology, Ken'Ichi Yamaguchi Nara National College of Technology, Hiroshi Iwata Nara National College of Technology, Hajimu Iida Nara Institute of Science and Technology | ||
12:15 10mTalk | GenAI in Software Engineering: The Role of Technology Acceptance Models Posters and Vision Oscar Johansson Blekinge Institute of Technology, Jürgen Börstler Blekinge Institute of Technology, Nauman bin Ali Blekinge Institute of Technology | ||