APSEC 2024
Tue 3 - Fri 6 December 2024 China

In application (app) development, effectively harnessing user feedback is crucial for enhancing app quality and user satisfaction. However, the vast and unstructured nature of user reviews often complicates these efforts, posing challenges in accurately capturing and integrating this feedback into the development processes. We automate the classification of issues in app reviews and examine how these issues correlate with code quality metrics (code smells and bug reports) and development activities (additions, deletions, and time to merge in pull requests). We aim to provide evidence-based guidance for effectively prioritizing and addressing user feedback. Employing a Mining Software Repositories (MSR) approach, we gathered and analyzed reviews from seven open-source Android apps. We evaluated the efficacy of three machine learning models—Support Vector Machines (SVM), BERT, and a fine-tuned GPT-3.5—for classifying issues in app reviews. The GPT-3.5 model achieved the highest accuracy at 95%. We found statistically significant correlations between the classified issues, code quality metrics, and development activities. However, these relationships varied across applications, highlighting the complex relationship between user feedback and the development process. Our study highlights the effectiveness of automated tools in identifying and classifying feedback within app reviews. Our automated approach enhances developers’ ability to manage feedback effectively and supports optimal resource allocation to improve app quality and user satisfaction.