Configurable Ensembles for Software Similarity: Challenging the Notion of Universal Metrics
Software similarity analysis is crucial in various fields, including code clone detection, security analysis, and software refactoring. While research continues to identify new use cases, numerous similarity detectors have already been proposed for specific contexts. These detectors usually leverage project attributes, such as source code, contributors, documentation, and dependencies. Existing works consistently demonstrate that their approaches outperform others in extensive evaluations. In this paper, we challenge the idea of a universally superior similarity model. We argue that similarity is a fluent concept and that relevant metrics always depend on specific needs. We present a novel framework that enables a flexible aggregation of diverse similarity models, allowing fine-tuned configurations for specific needs and use cases. Our evaluation incorporates multiple existing similarity models and their respective benchmarks to reveal the fundamental dilemma: depending on the configuration, our aggregated model will either confirm prior results or expose significant differences among individual models. However, we will demonstrate that these variations can be explained by the additional information that leads to more fine-grained results. Our results illustrate the future of software similarity research: configurable ensembles of much more specialized models.
Tue 9 SepDisplayed time zone: Auckland, Wellington change
13:30 - 14:30 | |||
13:30 20mResearch paper | Configurable Ensembles for Software Similarity: Challenging the Notion of Universal Metrics Research Track Shujun Huang Software Engineering Research Group (SERG), TU Delft, Sebastian Proksch Delft University of Technology Pre-print | ||
13:50 20mResearch paper | Challenging Bug Prediction and Repair Models with Synthetic Bugs Research Track Ali Reza Ibrahimzada University of Illinois Urbana-Champaign, Yang Chen University of Illinois at Urbana-Champaign, Ryan Rong Stanford University, Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign Pre-print Media Attached | ||
14:10 20mResearch paper | Plaintext in the Wild: Investigating Secure Connection Label Accuracy for Android Apps Research Track Yusei Sakuraba Okayama University, Hiroki Inayoshi Okayama University, Shoichi Saito Nagoya Institute of Technology, Akito Monden Okayama University File Attached |