How to Improve Deep Learning for Software Analytics (a case study with code smell detection) (MSR 2022 - Technical Papers)

Who

Rahul Yedida, Tim Menzies

Track

MSR 2022 Technical Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 18 May 2022 13:14 - 13:21 at MSR Main room - odd hours - Session 4: Software Quality (Bugs & Smells) Chair(s): Maxime Lamothe, Mahmoud Alfadel
Tue 24 May 2022 09:30 - 09:45 at Room 315+316 - Blended Technical Session 3 (Smells and Maintenance) Chair(s): Andy Zaidman

Abstract

To reduce technical debt and make code more maintainable, it is important to be able to warn programmers about code smells. State-of-the-art code small detectors use deep learners, without much exploration of alternatives within that technology.

One promising alternative for software analytics and deep learning is “GHOST” that relies on a combination of hyper-parameter optimization of feedforward neural networks and a novel oversampling technique to deal with class imbalance.

The prior study from TSE’21 proposing this novel “fuzzy sampling” was somewhat limited in that the method was tested on defect prediction, but nothing else. Like defect prediction, code smell detection datasets have a class imbalance (which motivated “fuzzy sampling”). Hence, in this work we test if fuzzy sampling is useful for code smell detection.

The results of this paper show that we can achieve better than state-of-the-art results on code smell detection with fuzzy oversampling. For example, for “feature envy”, we were able to achieve 99+% AUC across all our datasets, and on 8/10 datasets for “misplaced class” While our specific results refer to code smell detection, they do suggest other lessons for other kinds of analytics. For example: (a) try better preprocessing before trying complex learners (b) include simpler learners as a baseline in software analytics (c) try “fuzzy sampling” as one such baseline.

Link to Preprint

https://arxiv.org/pdf/2202.01322.pdf

Rahul Yedida

Tim Menzies

North Carolina State University

United States

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 18 May
Displayed time zone: Eastern Time (US & Canada) change

13:00 - 13:50	Session 4: Software Quality (Bugs & Smells)Data and Tool Showcase Track / Technical Papers at MSR Main room - odd hours Chair(s): Maxime Lamothe Polytechnique Montreal, Montreal, Canada, Mahmoud Alfadel University of Waterloo

13:00 7m Talk		Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue Technical Papers Rui Shu North Carolina State University, Tianpei Xia North Carolina State University, Laurie Williams North Carolina State University, Tim Menzies North Carolina State University
13:07 7m Talk		To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set? Technical Papers Matteo Ciniselli Università della Svizzera Italiana, Luca Pascarella Università della Svizzera italiana (USI), Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print
13:14 7m Talk		How to Improve Deep Learning for Software Analytics (a case study with code smell detection) Technical Papers Rahul Yedida , Tim Menzies North Carolina State University Pre-print
13:21 7m Talk		Using Active Learning to Find High-Fidelity Builds Technical Papers Harshitha Menon Lawrence Livermore National Lab, Konstantinos Parasyris Lawrence Livermore National Laboratory, Todd Gamblin Lawrence Livermore National Laboratory, Tom Scogland Lawrence Livermore National Laboratory Pre-print
13:28 4m Talk		ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction Data and Tool Showcase Track Hossein Keshavarz David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada, Mei Nagappan University of Waterloo Pre-print
13:32 4m Talk		ReCover: a Curated Dataset for Regression Testing Research Data and Tool Showcase Track Francesco Altiero Università degli Studi di Napoli Federico II, Anna Corazza Università degli Studi di Napoli Federico II, Sergio Di Martino Università degli Studi di Napoli Federico II, Adriano Peron Università degli Studi di Napoli Federico II, Luigi Libero Lucio Starace Università degli Studi di Napoli Federico II
13:36 14m Live Q&A		Discussions and Q&A Technical Papers

Tue 24 May
Displayed time zone: Eastern Time (US & Canada) change

09:00 - 10:30	Blended Technical Session 3 (Smells and Maintenance)Technical Papers / Mining Challenge / Registered Reports / Data and Tool Showcase Track at Room 315+316 Chair(s): Andy Zaidman Delft University of Technology

09:00 15m Talk		Smelly Variables in Ansible Infrastructure Code: Detection, Prevalence, and Lifetime Technical Papers Ruben Opdebeeck Vrije Universiteit Brussel, Ahmed Zerouali Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel Pre-print
09:15 15m Talk		Beyond Duplicates: Towards Understanding and Predicting Link Types in Issue Tracking Systems Technical Papers Clara Marie Lüders University of Hamburg, Abir Bouraffa University of Hamburg, Walid Maalej University of Hamburg DOI Pre-print
09:30 15m Talk		How to Improve Deep Learning for Software Analytics (a case study with code smell detection) Technical Papers Rahul Yedida , Tim Menzies North Carolina State University Pre-print
09:45 8m Talk		npm-filter: Automating the mining of dynamic information from npm packages Data and Tool Showcase Track Ellen Arteca Northeastern University, Alexi Turcotte Northeastern University Pre-print Media Attached
09:53 8m Talk		Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and RefactoringBest Mining Challenge Paper Award Mining Challenge Anthony Peruma Rochester Institute of Technology, Eman Abdullah AlOmar Stevens Institute of Technology, Christian D. Newman Rochester Institute of Technology, Mohamed Wiem Mkaouer Rochester Institute of Technology, Ali Ouni ETS Montreal, University of Quebec Pre-print Media Attached
10:01 8m Talk		CamBench - Cryptographic API Misuse Detection Tool Benchmark Suite Registered Reports Michael Schlichtig Heinz Nixdorf Institute at Paderborn University, Anna-Katharina Wickert TU Darmstadt, Germany, Stefan Krüger Independent Researcher, Eric Bodden University of Paderborn; Fraunhofer IEM, Mira Mezini TU Darmstadt Pre-print
10:09 21m Live Q&A		Discussions and Q&A Technical Papers

Information for Participants

Wed 18 May 2022 13:00 - 13:50 at MSR Main room - odd hours - Session 4: Software Quality (Bugs & Smells) Chair(s): Maxime Lamothe, Mahmoud Alfadel

Info for room MSR Main room - odd hours:

Click here to go to the room on Midspace