How to Improve Deep Learning for Software Analytics (a case study with code smell detection)
Tue 24 May 2022 09:30 - 09:45 at Room 315+316 - Blended Technical Session 3 (Smells and Maintenance) Chair(s): Andy Zaidman
To reduce technical debt and make code more maintainable, it is important to be able to warn programmers about code smells. State-of-the-art code small detectors use deep learners, without much exploration of alternatives within that technology.
One promising alternative for software analytics and deep learning is “GHOST” that relies on a combination of hyper-parameter optimization of feedforward neural networks and a novel oversampling technique to deal with class imbalance.
The prior study from TSE’21 proposing this novel “fuzzy sampling” was somewhat limited in that the method was tested on defect prediction, but nothing else. Like defect prediction, code smell detection datasets have a class imbalance (which motivated “fuzzy sampling”). Hence, in this work we test if fuzzy sampling is useful for code smell detection.
The results of this paper show that we can achieve better than state-of-the-art results on code smell detection with fuzzy oversampling. For example, for “feature envy”, we were able to achieve 99+% AUC across all our datasets, and on 8/10 datasets for “misplaced class” While our specific results refer to code smell detection, they do suggest other lessons for other kinds of analytics. For example: (a) try better preprocessing before trying complex learners (b) include simpler learners as a baseline in software analytics (c) try “fuzzy sampling” as one such baseline.
Wed 18 MayDisplayed time zone: Eastern Time (US & Canada) change
| 13:00 - 13:50 | Session 4: Software Quality (Bugs & Smells)Data and Tool Showcase Track / Technical Papers at MSR Main room - odd hours  Chair(s): Maxime Lamothe Polytechnique Montreal, Montreal, Canada, Mahmoud Alfadel University of Waterloo | ||
| 13:007m Talk | Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue Technical Papers Rui Shu North Carolina State University, Tianpei Xia North Carolina State University, Laurie Williams North Carolina State University, Tim Menzies North Carolina State University | ||
| 13:077m Talk | To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set? Technical Papers Matteo Ciniselli Università della Svizzera Italiana, Luca Pascarella Università della Svizzera italiana (USI), Gabriele Bavota Software Institute, USI Università della Svizzera italianaPre-print | ||
| 13:147m Talk | How to Improve Deep Learning for Software Analytics (a case study with code smell detection) Technical PapersPre-print | ||
| 13:217m Talk | Using Active Learning to Find High-Fidelity Builds Technical Papers Harshitha Menon Lawrence Livermore National Lab, Konstantinos Parasyris Lawrence Livermore National Laboratory, Todd Gamblin Lawrence Livermore National Laboratory, Tom Scogland Lawrence Livermore National LaboratoryPre-print | ||
| 13:284m Talk | ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction Data and Tool Showcase Track Hossein Keshavarz David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada, Mei Nagappan University of WaterlooPre-print | ||
| 13:324m Talk | ReCover: a Curated Dataset for Regression Testing Research Data and Tool Showcase Track Francesco Altiero Università degli Studi di Napoli Federico II, Anna Corazza Università degli Studi di Napoli Federico II, Sergio Di Martino Università degli Studi di Napoli Federico II, Adriano Peron Università degli Studi di Napoli Federico II, Luigi Libero Lucio Starace Università degli Studi di Napoli Federico II | ||
| 13:3614m Live Q&A | Discussions and Q&A Technical Papers | ||
