Challenging Bug Prediction and Repair Models with Synthetic Bugs
Bugs are essential in software engineering; many research studies in the past decades have been proposed to detect, localize, and repair bugs in software systems. Effectiveness evaluation of such techniques requires complex bugs, i.e., those that are hard to detect through testing and hard to repair through debugging. From the classic software engineering point of view, a hard-to-repair bug differs from the correct code in multiple locations, making it hard to localize and repair. Hard-to-detect bugs, on the other hand, manifest themselves under specific test inputs and reachability conditions. These two objectives, i.e., generating hard-to-detect and hard-to-repair bugs, are mostly aligned; a bug generation technique can change multiple statements to be covered only under a specific set of inputs. However, these two objectives conflict in the learning-based techniques: A bug should have a similar code representation to the correct code in the training data to challenge a bug prediction model to distinguish them. The hard-to-repair bug definition remains the same but with a caveat: the more a bug differs from the original code (at multiple locations), the more distant their representations are and easier to detect. This demands new techniques to generate bugs to complement existing bug datasets to challenge learning-based bug prediction and repair techniques. We propose BugFarm to transform arbitrary code into multiple hard-to-detect and hard-to-repair bugs. BugFarm mutates code in multiple locations (hard-to-repair) but leverages attention analysis to only change the least attended locations by the underlying model (hard-to-detect). Our comprehensive evaluation of 435k+ bugs from over 1.9M mutants generated by BugFarm and two alternative approaches demonstrates our superiority in generating bugs that are hard to detect by learning-based bug prediction approaches (up to 40.53% higher False Negative Rate and 10.76%, 5.2%, 28.93%, and 20.53% lower Accuracy, Precision, Recall, and F1 score) and hard to repair by state-of-the-art learning-based program repair technique (28% repair success rate compared to 36% and 49% of LEAM and μBERT bugs). BugFarm is efficient, i.e., it takes nine seconds to mutate a code with no training overhead.
Tue 9 SepDisplayed time zone: Auckland, Wellington change
13:30 - 14:30 | |||
13:30 20mResearch paper | Configurable Ensembles for Software Similarity: Challenging the Notion of Universal Metrics Research Track Shujun Huang Software Engineering Research Group (SERG), TU Delft, Sebastian Proksch Delft University of Technology Pre-print | ||
13:50 20mResearch paper | Challenging Bug Prediction and Repair Models with Synthetic Bugs Research Track Ali Reza Ibrahimzada University of Illinois Urbana-Champaign, Yang Chen University of Illinois at Urbana-Champaign, Ryan Rong Stanford University, Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign Pre-print Media Attached | ||
14:10 20mResearch paper | Plaintext in the Wild: Investigating Secure Connection Label Accuracy for Android Apps Research Track Yusei Sakuraba Okayama University, Hiroki Inayoshi Okayama University, Shoichi Saito Nagoya Institute of Technology, Akito Monden Okayama University File Attached |