Autorepairability of ChatGPT and Gemini: A Comparative Study
In recent years, Automated Program Repair (APR), which focuses on automatically fixing source code without human intervention, has become a hot topic in the field of software engineering, leading to the proposal of various automatic repair techniques. Additionally, Lapvikai et al. introduced a new software quality metric called “Autorepairability”. Autorepairability is a metric that indicates how easily bugs in the target source code can be fixed using APR techniques.
By utilizing Autorepairability, it becomes possible to pre-check whether the program repair techniques will work effectively on the target software and to perform refactoring to improve Autorepairability. Lapvikai et al. not only proposed Autorepairability but also measured it using traditional APR techniques based on genetic programming. However, in the past two to three years, program repair using large language models (LLMs) has become more prevalent, and several studies have revealed that these models exhibit superior repair capabilities compared to traditional APR techniques.
In this study, we applied Autorepairability for the performance comparison of multiple APR techniques. Specifically, we measured and compared Autorepairability using ChatGPT and Gemini, which are representative large language models, as well as kGenProg, a traditional APR technique. The results demonstrated that Gemini exhibited higher repair capabilities compared to both ChatGPT and the traditional APR technique kGenProg.