On the Relative Value of Feature Selection Techniques for Code Smell Detection
Machine/deep learning-based code smell detection aims to develop a classification model based on code smell features to predict the presence of code smell in new code instances. To ensure accurate detection, it is crucial to eliminate irrelevant or redundant features that may negatively impact performance. Previous studies have produced inconsistent findings about the impact of feature selection techniques for code smell detection, possibly because they examined only a limited number of different techniques. To address this gap, our study aims to provide a comprehensive analysis of feature selection techniques in code smell detection. We investigate 34 feature selection techniques with 7 classification models to build the code smell detection models on 6 code smell datasets. To assess these effects, we use 3 evaluation metrics, i.e., Precision, Recall, and F-measure, and compare the performance differences using the Scott-Knott effect size difference test and the McNemar’s test. The results show that (1) Not all feature selection techniques significantly improve detection performance. The techniques with better performance are chi-square, probabilistic significance, information gain, and symmetrical uncertainty. (2) In general, probabilistic significance should be used as the “generic” feature selection technique because detection models using probabilistic significance can identify more of the same smelly instances compared to models using other methods. (3) The high-frequency features selected by the four highest-performing techniques, which are important for identifying the corresponding code smells, are different for each dataset.
Thu 5 DecDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
16:00 - 17:30 | |||
16:00 30mTalk | On the Relative Value of Feature Selection Techniques for Code Smell Detection Technical Track Zexian Zhang Wuhan University of Technology, Shuang Yin Wuhan University of Technology, Lin Zhu Wuhan University of Technology, Shan Gao Hokkaido University, Haoxuan Chen Wuhan University of Technology, Wenhua Hu Wuhan University of Technology, Fuyang Li Wuhan University of Technology | ||
16:30 30mTalk | An Empirical Study on Self-Admitted Technical Debt in Quantum Software Technical Track Yuta Ishimoto Kyushu University, Yuto Nakamura Kyushu University, Ryota Katsube Hitachi, Ltd., Naoto Sato Research & Development Group, Hitachi, Ltd., Hideto Ogawa Hitachi Ltd., Masanari Kondo Kyushu University, Yasutaka Kamei Kyushu University, Naoyasu Ubayashi Kyushu University | ||
17:00 30mTalk | Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt Technical Track Edi Sutoyo Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Paris Avgeriou University of Groningen, The Netherlands, Andrea Capiluppi University of Groningen |