Some Automatically Generated Patches are More Likely to be Correct than Others: An Analysis of Defects4J Patch Features
Defects4J is a popular dataset against which many Java Automatic Program Repair (APR) tools benchmark their performance. However, recent evidence suggests that some APR tools overfit to Defects4J, producing plausible patches which are incorrect. What we do not currently know is whether there is any commonality in the features of these plausible patches that turn out not to be correct. We compare the features of Defects4J’s human written patches in terms of those correctly patched by existing APR tools and those incorrectly patched. We found that 48.4% of Defects4J v1.5 have been automatically patched by existing APR tools; of which only 28.9% have been correctly patched leaving 19.5% incorrectly patched. We found that the human written patches of defects incorrectly patched by APR tools were twice the size of those that have been correctly patched. We also found patches of defects that added a method call, added a variable, or wrapped existing code with new code, such as a \texttt{try/catch} block were significantly associated with incorrect patches. Editing only a single line was significantly associated with correct patches. Our results suggest that current tools are weak at generating multi-line patches and synthesising new code especially when wrapping existing code. Our results highlight potential future areas of development for new APR approaches.
Thu 19 MayDisplayed time zone: Eastern Time (US & Canada) change
10:45 - 11:00 | |||
10:45 7mTalk | Some Automatically Generated Patches are More Likely to be Correct than Others: An Analysis of Defects4J Patch Features APR Gareth Bennett Lancaster University, Tracy Hall Lancaster University, David Bowes Lancaster University | ||
10:52 7mLive Q&A | Q&A APR |