FRUGAL: Unlocking Semi-supervised Learning for Software Analytics
Standard software analytics often involves having a large amount of data with labels in order to commission models with acceptable performance. However, prior work has shown that such requirements can be expensive, taking several weeks to label thousands of commits, and not always available when traversing new research problems and domains. Unsupervised Learning is a promising direction to learn hidden patterns within unlabelled data, which has only been extensively studied in defect prediction. Nevertheless, unsupervised learning can be ineffective by itself and has not been explored in other domains (e.g., static analysis and issue close time).
Motivated by this literature gap and technical limitations, we explore the performance variations seen in several simple optimization schemes. We present FRUGAL, a tuned semi-supervised method that builds on a simple optimization scheme that does not require sophisticated (e.g., deep learners) and expensive (e.g., 100% manually labelled data) methods. Our method optimizes the unsupervised learner’s configurations in the grid search manner while validating the picked settings on only 10% of the labelled train data before predicting. FRUGAL outperforms the state-of-the-art actionable static code warning recognizer and issue closed time predictor with less information, reducing the cost of labelling by 90%.
Our conclusions are two-fold. Firstly, FRUGAL can save considerable efforts in data labelling especially in validating prior work or researching new problems. Secondly, proponents of complex and expensive methods should always baseline such methods against simpler and cheaper alternatives. For instance, a semi-supervised learner like FRUGAL can serve as a baseline to the state-of-the-art software analytics tools.
Wed 17 NovDisplayed time zone: Hobart change
09:00 - 10:00
|Faster Mutation Analysis with Fewer Processes and Smaller Overheads|
|FRUGAL: Unlocking Semi-supervised Learning for Software Analytics|
|Automatically Deciding on the Integration of Commits Based on Their Descriptions|
|SigRec: Automatic Recovery of Function Signatures in Smart Contracts|
Ting Chen University of Electronic Science and Technology of China, zihao li The Hong Kong Polytechnic Universituy, Xiapu Luo Hong Kong Polytechnic University, XiaoFeng Wang Indiana University Bloomington, Ting Wang Penn State University, Hongwei Li University of Electronic Science and Technology of China, Xiaosong Zhang University of Electronic Science and Technology of ChinaLink to publication DOI