On the Relative Value of Feature Selection Techniques for Code Smell Detection (APSEC 2024 - Technical Track)

Who

Zexian Zhang, Shuang Yin, Lin Zhu, Shan Gao, Haoxuan Chen, Wenhua Hu, Fuyang Li

Track

APSEC 2024 Technical Track

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 5 Dec 2024 16:00 - 16:30 at Room 4 (Xianglin Ballroom) - Session (15) Chair(s): Xiaoxue Ren

Abstract

Machine/deep learning-based code smell detection aims to develop a classification model based on code smell features to predict the presence of code smell in new code instances. To ensure accurate detection, it is crucial to eliminate irrelevant or redundant features that may negatively impact performance. Previous studies have produced inconsistent findings about the impact of feature selection techniques for code smell detection, possibly because they examined only a limited number of different techniques. To address this gap, our study aims to provide a comprehensive analysis of feature selection techniques in code smell detection. We investigate 34 feature selection techniques with 7 classification models to build the code smell detection models on 6 code smell datasets. To assess these effects, we use 3 evaluation metrics, i.e., Precision, Recall, and F-measure, and compare the performance differences using the Scott-Knott effect size difference test and the McNemar’s test. The results show that (1) Not all feature selection techniques significantly improve detection performance. The techniques with better performance are chi-square, probabilistic significance, information gain, and symmetrical uncertainty. (2) In general, probabilistic significance should be used as the “generic” feature selection technique because detection models using probabilistic significance can identify more of the same smelly instances compared to models using other methods. (3) The high-frequency features selected by the four highest-performing techniques, which are important for identifying the corresponding code smells, are different for each dataset.

Zexian Zhang

Wuhan University of Technology

China

Shuang Yin

Wuhan University of Technology

China

Lin Zhu

Wuhan University of Technology

China

Shan Gao

Hokkaido University

Japan

Haoxuan Chen

Wuhan University of Technology

China

Wenhua Hu

Wuhan University of Technology

China

Fuyang Li

Wuhan University of Technology

China

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 5 Dec
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

16:00 - 17:30	Session (15)Technical Track at Room 4 (Xianglin Ballroom) Chair(s): Xiaoxue Ren Zhejiang University

16:00 30m Talk		On the Relative Value of Feature Selection Techniques for Code Smell Detection Technical Track Zexian Zhang Wuhan University of Technology, Shuang Yin Wuhan University of Technology, Lin Zhu Wuhan University of Technology, Shan Gao Hokkaido University, Haoxuan Chen Wuhan University of Technology, Wenhua Hu Wuhan University of Technology, Fuyang Li Wuhan University of Technology
16:30 30m Talk		An Empirical Study on Self-Admitted Technical Debt in Quantum Software Technical Track Yuta Ishimoto Kyushu University, Yuto Nakamura Kyushu University, Ryota Katsube Hitachi, Ltd., Naoto Sato Research & Development Group, Hitachi, Ltd., Hideto Ogawa Hitachi Ltd., Masanari Kondo Kyushu University, Yasutaka Kamei Kyushu University, Naoyasu Ubayashi Kyushu University
17:00 30m Talk		Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt Technical Track Edi Sutoyo Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Paris Avgeriou University of Groningen, The Netherlands, Andrea Capiluppi University of Groningen