ECSA 2024
Mon 2 - Fri 6 September 2024 Luxembourg, Luxembourg

User stories serve as a fundamental tool in agile software development methodologies, articulating the functional requirements of a system from an end-user perspective. However, while user stories excel in delineating the desired features and functionalities, they frequently overlook the non-functional aspects critical to the system’s success. These non-functional aspects encompass a spectrum of quality concerns, including but not limited to performance, security, reliability, usability, and compatibility. Despite their paramount importance, these quality concerns often remain implicit or underrepresented in user stories, necessitating a deliberate effort to extract and elucidate them during the requirements elicitation process. Failure to address these quality concerns upfront can lead to architectural decisions that overlook critical performance bottlenecks, security vulnerabilities, reliability issues, and usability shortcomings. Consequently, this oversight may result in suboptimal system designs, increased development costs, delayed time-to-market, diminished user satisfaction, and heightened operational risks. This paper presents an ISO-25010 compliant Transfer Learning approach for automated quality concerns extraction from user stories and corresponding acceptance criteria. The proposed solution is constructed upon the Transformer-based RoBERTa-Large model, leveraging and extending its pre-trained capabilities. This approach proficiently classifies user stories and acceptance criteria into 5 most critical user quality concerns including Usability, Performance, Reliability, Security, and Compatibility. This process involves cleaning and preprocessing the dataset followed by fine-tuning the pre-trained models on the refined data set. A comparative analysis of Three mainstream BERT variants including RoBERTa-base, DistilBERT and XLNET is also provided. Considering non-availability of public data sets in this scope, a dataset of approximately 1000 user stories with acceptance criteria was compiled by mining 30 projects, collected from different sources. This dataset was subsequently labeled through an extensive labeling activity. The findings suggest that the RoBERTa-Large fine-tuned variant achieves an impressive level performance in terms of accuracy, precision, recall and Avg F1 score.