Can GitHub Issues Help in App Review Classifications? (ASE 2024 - Journal-first Papers)

Who

Yasaman Abedini, Abbas Heydarnoori

Track

ASE 2024 Journal-first Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 29 Oct 2024 15:30 - 15:45 at Magnoila - Mobile app development and app reivew

Abstract

Analyzing user reviews posted in app stores has always been a point of interest for developers, as it helps them identify user requirements, application issues, and successfully release planning. Various approaches have been proposed that utilize natural language processing and machine learning techniques to explore and classify the diverse information provided in reviews. Consequently, a sufficient volume of accurately labeled datasets is needed to train and evaluate generalizable classifiers. Recent studies have identified this issue as a significant challenge for evaluating and comparing existing approaches, demonstrating that models trained on the current labeled datasets exhibit lower-than-expected performance when predicting on unseen datasets. In addition to app stores, user reviews can also be found on other platforms like GitHub. A key advantage of these datasets is that they are usually labeled by their developers for better issue management. Notably, they often contain sections similar to user reviews that have been considered in previous research. In this paper, we leverage processed labeled issues (auxiliary dataset) to augment manually labeled datasets (primary dataset) to improve the generalizability of review classification models. Therefore, we create the auxiliary dataset by adopting issue labels with desired review labels (e.g. bug reports, feature requests, and other) and extracting target information from the issue bodies. Initially, we syntactically and semantically processed 5,641 issue labels from 999 repositories and identified those relevant to our targets. Then, we manually analyzed 577 issue templates and 2,089 frequently used sections in the issue bodies, defining 19 language patterns to identify the target information in issue bodies. We utilize five well-known datasets, prepared through standard processes in previous studies, to create the primary dataset and refine their labels into a unified format. Next, we integrate the auxiliary and primary datasets using three methods, including Within-App (using same app issues), Within-Context (using similar app issues), and Between-App (using random issues) Analysis. Finally, we utilize the augmented dataset to fine-tune a transformer-based model for both bug report and feature request classifiers. We evaluate the impact of our proposed models on enhancing generalizability through multiple experiments. Obtained results on our standardized truth dataset demonstrate that dataset augmentation can lead to an increase in F1-score by 6.3 and 7.2 for bug reports and feature requests, respectively. Additionally, our experiments revealed that the ratio of auxiliary dataset size to primary dataset size significantly affects the improvement achieved by proposed approach. We identified an effective range for this ratio between 0.3 and 0.7.

Yasaman Abedini

Sharif University of Technology

Abbas Heydarnoori

Bowling Green State University

United States