Write a Blog >>
MSR 2022
Mon 23 - Tue 24 May 2022
co-located with ICSE 2022

To reduce technical debt and make code more maintainable, it is important to be able to warn programmers about code smells. State-of-the-art code small detectors use deep learners, without much exploration of alternatives within that technology.

One promising alternative for software analytics and deep learning is “GHOST” that relies on a combination of hyper-parameter optimization of feedforward neural networks and a novel oversampling technique to deal with class imbalance.

The prior study from TSE’21 proposing this novel “fuzzy sampling” was somewhat limited in that the method was tested on defect prediction, but nothing else. Like defect prediction, code smell detection datasets have a class imbalance (which motivated “fuzzy sampling”). Hence, in this work we test if fuzzy sampling is useful for code smell detection.

The results of this paper show that we can achieve better than state-of-the-art results on code smell detection with fuzzy oversampling. For example, for “feature envy”, we were able to achieve 99+% AUC across all our datasets, and on 8/10 datasets for “misplaced class” While our specific results refer to code smell detection, they do suggest other lessons for other kinds of analytics. For example: (a) try better preprocessing before trying complex learners (b) include simpler learners as a baseline in software analytics (c) try “fuzzy sampling” as one such baseline.

Wed 18 May

Displayed time zone: Eastern Time (US & Canada) change

13:00 - 13:50
Session 4: Software Quality (Bugs & Smells)Data and Tool Showcase Track / Technical Papers at MSR Main room - odd hours
Chair(s): Maxime Lamothe Polytechnique Montreal, Montreal, Canada, Mahmoud Alfadel University of Waterloo
13:00
7m
Talk
Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue
Technical Papers
Rui Shu North Carolina State University, Tianpei Xia North Carolina State University, Laurie Williams North Carolina State University, Tim Menzies North Carolina State University
13:07
7m
Talk
To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set?
Technical Papers
Matteo Ciniselli Università della Svizzera Italiana, Luca Pascarella Università della Svizzera italiana (USI), Gabriele Bavota Software Institute, USI Università della Svizzera italiana
Pre-print
13:14
7m
Talk
How to Improve Deep Learning for Software Analytics (a case study with code smell detection)
Technical Papers
Rahul Yedida , Tim Menzies North Carolina State University
Pre-print
13:21
7m
Talk
Using Active Learning to Find High-Fidelity Builds
Technical Papers
Harshitha Menon Lawrence Livermore National Lab, Konstantinos Parasyris Lawrence Livermore National Laboratory, Todd Gamblin Lawrence Livermore National Laboratory, Tom Scogland Lawrence Livermore National Laboratory
Pre-print
13:28
4m
Talk
ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction
Data and Tool Showcase Track
Hossein Keshavarz David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada, Mei Nagappan University of Waterloo
Pre-print
13:32
4m
Talk
ReCover: a Curated Dataset for Regression Testing Research
Data and Tool Showcase Track
Francesco Altiero Università degli Studi di Napoli Federico II, Anna Corazza Università degli Studi di Napoli Federico II, Sergio Di Martino Università degli Studi di Napoli Federico II, Adriano Peron Università degli Studi di Napoli Federico II, Luigi Libero Lucio Starace Università degli Studi di Napoli Federico II
13:36
14m
Live Q&A
Discussions and Q&A
Technical Papers

Tue 24 May

Displayed time zone: Eastern Time (US & Canada) change

09:00 - 10:30
Blended Technical Session 3 (Smells and Maintenance)Technical Papers / Mining Challenge / Registered Reports / Data and Tool Showcase Track at Room 315+316
Chair(s): Andy Zaidman Delft University of Technology
09:00
15m
Talk
Smelly Variables in Ansible Infrastructure Code: Detection, Prevalence, and Lifetime
Technical Papers
Ruben Opdebeeck Vrije Universiteit Brussel, Ahmed Zerouali Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel
Pre-print
09:15
15m
Talk
Beyond Duplicates: Towards Understanding and Predicting Link Types in Issue Tracking Systems
Technical Papers
Clara Marie Lüders University of Hamburg, Abir Bouraffa University of Hamburg, Walid Maalej University of Hamburg
DOI Pre-print
09:30
15m
Talk
How to Improve Deep Learning for Software Analytics (a case study with code smell detection)
Technical Papers
Rahul Yedida , Tim Menzies North Carolina State University
Pre-print
09:45
8m
Talk
npm-filter: Automating the mining of dynamic information from npm packages
Data and Tool Showcase Track
Ellen Arteca Northeastern University, Alexi Turcotte Northeastern University
Pre-print Media Attached
09:53
8m
Talk
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and RefactoringBest Mining Challenge Paper Award
Mining Challenge
Anthony Peruma Rochester Institute of Technology, Eman Abdullah AlOmar Stevens Institute of Technology, Christian D. Newman Rochester Institute of Technology, Mohamed Wiem Mkaouer Rochester Institute of Technology, Ali Ouni ETS Montreal, University of Quebec
Pre-print Media Attached
10:01
8m
Talk
CamBench - Cryptographic API Misuse Detection Tool Benchmark Suite
Registered Reports
Michael Schlichtig Heinz Nixdorf Institute at Paderborn University, Anna-Katharina Wickert TU Darmstadt, Germany, Stefan Krüger Independent Researcher, Eric Bodden University of Paderborn; Fraunhofer IEM, Mira Mezini TU Darmstadt
Pre-print
10:09
21m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants