Refining Fitness Functions for Search-Based Automated Program Repair: A Case Study with ARJA and ARJA-e
Automated Program Repair (APR) strives to automatically fix faulty software without human-intervention. Search-based APR iteratively generates possible patches for a given buggy software, guided by the execution of the patched program on a given test suite (i.e., a set of test cases). Search-based approaches have generally only used Boolean test case results (i.e., pass or fail), but recently more fined-grained fitness evaluations have been investigated with promising yet unsettled results. Using the most recent extension of the very popular Defects4J bug dataset, we conduct an empirical study using ARJA and ARJA-e, two state-of-the-art search-based APR systems using a Boolean and a non-Boolean fitness function, respectively. We aim to both extend previous results using new bugs from Defects4J v2.0 and to settle whether refining the fitness function helps fixing bugs present in large software.
In our experiments using 151 non-deprecated and not previously evaluated bugs from Defects4J v2.0, ARJA was able to find patches for 6.62% (10/151) of bugs, whereas ARJA-e found patches for 7.24% (12/151) of bugs. We thus observe only small advantage to using the refined fitness function. This contrasts with the previous work using Defects4J v1.0.1 where ARJA was able to find adequate patches for 24.2% (59/244) of the bugs and ARJA-e for 43.4% (106/244). These results may indicate a potential overfitting of the tools towards the previous version of the Defects4J dataset.
Presentation at: https://youtu.be/lcpYTv1TaE8