APSEC 2024
Tue 3 - Fri 6 December 2024 China

Classifying the bug issue types correctly plays a vital role in improving the quality of the deep learning (DL)-oriented projects. Although prior studies have proposed different approaches based on Pre-Trained Models (PTMs) for issue type classification in traditional GitHub repositories, DL-oriented projects are different from traditional software, especially in terms of bugs with different causes and symptoms. More importantly, these PTMs-based approaches trained on the issue reports are labeled when software users submit, which would be wrong and non-subdivided bug issue types. Therefore, an automated approach with the ground-truth bug issue types for labeling issues in DL-oriented projects is necessary for DL software repositories. To fill these gaps, we first manually labeled 9,073 issue reports from 11 DL-oriented projects as the ground truths to establish authentic labels. We then explore the effectiveness of six PTMs on the bug issues identification for the DL software repository. Our findings indicate that i) PTMs (especially BERT) could identify more precise bug issue types of DL software than prior DL approaches in all the datasets. ii) contrary to their performance in traditional software bug classification tasks, Software Engineering (SE) domain-specific PTMs cannot achieve significantly better performance than our compared general PTMs and may even perform worse for the DL bug issue classification. iii) in the cross-framework scenarios, the F1-score of PTMs declined by 18.5% to 19.8%. Despite that the performance is suffered, BERT can still achieve the best results. Conclusively, we propose that PTM-based bug issue classification offers potential for more widespread applications and prompt future studies to further examine and verify the generalizability of PTM-based methods in software engineering.