Classifying Bug Issue Types for Deep Learning-oriented Projects with Pre-Trained Model
Classifying the bug issue types correctly plays a vital role in improving the quality of the deep learning (DL)-oriented projects. Although prior studies have proposed different approaches based on Pre-Trained Models (PTMs) for issue type classification in traditional GitHub repositories, DL-oriented projects are different from traditional software, especially in terms of bugs with different causes and symptoms. More importantly, these PTMs-based approaches trained on the issue reports are labeled when software users submit, which would be wrong and non-subdivided bug issue types. Therefore, an automated approach with the ground-truth bug issue types for labeling issues in DL-oriented projects is necessary for DL software repositories. To fill these gaps, we first manually labeled 9,073 issue reports from 11 DL-oriented projects as the ground truths to establish authentic labels. We then explore the effectiveness of six PTMs on the bug issues identification for the DL software repository. Our findings indicate that i) PTMs (especially BERT) could identify more precise bug issue types of DL software than prior DL approaches in all the datasets. ii) contrary to their performance in traditional software bug classification tasks, Software Engineering (SE) domain-specific PTMs cannot achieve significantly better performance than our compared general PTMs and may even perform worse for the DL bug issue classification. iii) in the cross-framework scenarios, the F1-score of PTMs declined by 18.5% to 19.8%. Despite that the performance is suffered, BERT can still achieve the best results. Conclusively, we propose that PTM-based bug issue classification offers potential for more widespread applications and prompt future studies to further examine and verify the generalizability of PTM-based methods in software engineering.
Wed 4 DecDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
16:00 - 17:30 | Session (7)Technical Track / ERA - Early Research Achievements at Room 4 (Xianglin Ballroom) Chair(s): Cuiyun Gao Harbin Institute of Technology | ||
16:00 30mTalk | Automatic Commit Range Identification of Untagged Version Technical Track Yan Zhu Zhejiang University, Lingfeng Bao Zhejiang University, Chengjie Chen Zhejiang University, Lexiao Zhang School of Software Technology, Zhejiang University, Xin Yin Zhejiang University, Chao Ni Zhejiang University | ||
16:30 30mTalk | Classifying Bug Issue Types for Deep Learning-oriented Projects with Pre-Trained Model Technical Track Zixuan Zeng School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Yu Zhao , Lina Gong Nanjing University of Aeronautics and Astronautic | ||
17:00 20mTalk | GHA-BFP: Framework for Automated Build Failure Prediction in GitHub Actions ERA - Early Research Achievements Jiatai Li National University of Defense Technology, Yang Zhang National University of Defense Technology, China, Tao Wang National University of Defense Technology, Yiwen Wu National University of Defense Technology |