Analyzing user reviews posted in app stores has always been a point of interest for developers, as it helps them identify user requirements, application issues, and successfully release planning. Various approaches have been proposed that utilize natural language processing and machine learning techniques to explore and classify the diverse information provided in reviews. Consequently, a sufficient volume of accurately labeled datasets is needed to train and evaluate generalizable classifiers. Recent studies have identified this issue as a significant challenge for evaluating and comparing existing approaches, demonstrating that models trained on the current labeled datasets exhibit lower-than-expected performance when predicting on unseen datasets. In addition to app stores, user reviews can also be found on other platforms like GitHub. A key advantage of these datasets is that they are usually labeled by their developers for better issue management. Notably, they often contain sections similar to user reviews that have been considered in previous research. In this paper, we leverage processed labeled issues (auxiliary dataset) to augment manually labeled datasets (primary dataset) to improve the generalizability of review classification models. Therefore, we create the auxiliary dataset by adopting issue labels with desired review labels (e.g. bug reports, feature requests, and other) and extracting target information from the issue bodies. Initially, we syntactically and semantically processed 5,641 issue labels from 999 repositories and identified those relevant to our targets. Then, we manually analyzed 577 issue templates and 2,089 frequently used sections in the issue bodies, defining 19 language patterns to identify the target information in issue bodies. We utilize five well-known datasets, prepared through standard processes in previous studies, to create the primary dataset and refine their labels into a unified format. Next, we integrate the auxiliary and primary datasets using three methods, including Within-App (using same app issues), Within-Context (using similar app issues), and Between-App (using random issues) Analysis. Finally, we utilize the augmented dataset to fine-tune a transformer-based model for both bug report and feature request classifiers. We evaluate the impact of our proposed models on enhancing generalizability through multiple experiments. Obtained results on our standardized truth dataset demonstrate that dataset augmentation can lead to an increase in F1-score by 6.3 and 7.2 for bug reports and feature requests, respectively. Additionally, our experiments revealed that the ratio of auxiliary dataset size to primary dataset size significantly affects the improvement achieved by proposed approach. We identified an effective range for this ratio between 0.3 and 0.7.
Tue 29 OctDisplayed time zone: Pacific Time (US & Canada) change
15:30 - 16:30 | Mobile app development and app reivewJournal-first Papers / NIER Track / Tool Demonstrations at Magnoila | ||
15:30 15mTalk | Can GitHub Issues Help in App Review Classifications? Journal-first Papers | ||
15:45 15mTalk | App Review Driven Collaborative Bug Finding Journal-first Papers Xunzhu Tang University of Luxembourg, Haoye Tian University of Melbourne, Pingfan Kong Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Saad Ezzini Lancaster University, Kui Liu Huawei, Xin Xia Huawei, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg | ||
16:00 10mTalk | Assessing the feasibility of Micro frontend architecture in native mobile app development NIER Track Quentin Capdepon LIRMM - University of Montpellier, Nicolas Hlad Berger-Levrault, Benoit Verhaeghe Berger-Levrault, Abdelhak Seriai LIRMM, CNRS and University of Montpellier | ||
16:10 10mTalk | Model-based GUI Testing For HarmonyOS Apps Tool Demonstrations Yige Chen Southern University of Science and Technology, Sinan Wang Southern University of Science and Technology, Yida Tao Southern University of Science and Technology, Yepang Liu Southern University of Science and Technology | ||
16:20 10mTalk | Towards Extracting Ethical Concerns-related Software Requirements from App ReviewsRecorded Talk NIER Track |