Exploring True Test Overfitting in Dynamic Automated Program Repair using Formal Methods
Automated program repair (APR) techniques have shown a promising ability to generate patches that fix program bugs automatically. Typically such APR tools are dynamic in the sense that they find bugs by testing and they validate patches by running a program’s test suite. Patches can also be validated manually. However, neither of these methods for validating patches can truly tell whether a patch is correct. Test suites are usually incomplete, and thus APR-generated patches may pass the tests but not be truly correct; in other words, the APR tools may be overfitting to the tests. The possibility of test overfitting leads to manual validation, which is costly, potentially biased, and can also be incomplete. Therefore, we must move past these methods to truly assess APR’s overfitting problem.We aim to evaluate the test overfitting problem in dynamic APR tools using ground truth given by a set of programs equipped with formal behavioral specifications. Using these formal specifications and an automated verification tool, we found that there is definitely overfitting in the generated patches of seven well-studied APR tools, although many (about 59%) of the generated patches were indeed correct. Our study further points out two new problems that can affect APR tools: changes to the complexity of programs and numeric problems. An additional contribution is that we introduce the first publicly available data set of formally specified and verified Java programs, their test suites, and buggy variants, each of which has exactly one bug.
Tue 18 AprDisplayed time zone: Dublin change
14:00 - 15:30 | Session 10: Program RepairResearch Papers / Previous Editions / Posters at Grand canal Chair(s): Gunel Jahangirova USI Lugano, Switzerland | ||
14:00 20mTalk | Exploring True Test Overfitting in Dynamic Automated Program Repair using Formal Methods Previous Editions Amirfarhad Nilizadeh University of Central Florida, Gary T. Leavens University of Central Florida, Xuan Bach D. Le The University of Melbourne, Corina S. Păsăreanu Carnegie Mellon University, David Cok Safer Software Consulting, LLC DOI | ||
14:20 20mTalk | Embedding Context as Code Dependencies for Neural Program Repair Research Papers Noor Nashid University of British Columbia, Mifta Sintaha University of British Columbia, Ali Mesbah University of British Columbia (UBC) | ||
14:40 20mTalk | CorCA: An Automatic Program Repair Tool for Checking and Removing Effectively C Flaws Research Papers João Inácio LASIGE, Faculdade de Ciências da Universidade de Lisboa, Ibéria Medeiros LaSIGE, Faculdade de Ciências da Universidade de Lisboa | ||
15:00 20mTalk | Set the right example when teaching programming: Test Informed Learning with Examples (TILE) Research Papers Niels Doorn Open Universiteit and NHL Stenden University of Applied Sciences, Tanja E. J. Vos Universitat Politècnica de València and Open Universiteit, Beatriz Marín Universitat Politècnica de València, Erik Barendsen Open Universiteit | ||
15:20 5mTalk | Poster: Software Fault Localization as a Service (SFLaaS) Posters Qusay Idrees Sarhan Department of Software Engineering, University of Szeged, Hassan Bapeer Hassan University of Duhok, Árpád Beszédes Department of Software Engineering, University of Szeged | ||
15:25 5mTalk | Poster: Improving Spectrum Based Fault Localization For Python Programs Using Weighted Code Elements Posters Qusay Idrees Sarhan Department of Software Engineering, University of Szeged, Árpád Beszédes Department of Software Engineering, University of Szeged |