Automatic Prediction of Developers' Resolutions for Software Merge Conflicts (ASE 2024 - Journal-first Papers)

Who

Waad riadh aldndni, Na Meng, Francisco Servant

Track

ASE 2024 Journal-first Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 31 Oct 2024 14:00 - 14:15 at Gardenia - Software Merge Chair(s): Haiyan Zhao

Abstract

In collaborative software development, developers simultaneously work in parallel on different branches that they merge periodically. When edits from different branches textually overlap, conflicts may occur. Manually resolving conflicts can be tedious and error-prone. Researchers proposed tool support for conflict resolution, but these tools barely consider developers’ preferences. Conflicts can be resolved by: keeping the local version only (KL), keeping the remote version only (KR), or manually editing them (ME). Recent studies show that developers resolved the majority of textual conflicts by KL or KR. Thus, we created a machine learning-based approach RPredictor to predict developers’ resolution strategy (KL, KR, or ME) given a merge conflict.

To explore the feasibility of creating a predictor for conflict resolution strategies, we first conducted an empirical study to characterize the conflicts in software version history that got resolved with different strategies. We gathered 15,758 conflicts from 100 open-source software repositories, and studied 12 features to characterize each conflict from different perspectives. Our statistical analysis shows a strong correlation between these features and developers’ resolution decisions, indicating a strong potential for successfully building a resolution predictor.

Leveraging the 12 features revealed by our study, we designed and implemented an approach, RPredictor, to automatically predict resolution strategies. RPredictor operates in two phases: training and testing. In the training phase, RPredictor extracts features for each conflict in a set of merge conflicts that were already resolved in the past, and trains a three-class random forest (RF) classifier. In the testing phase, it takes in any new conflict together with the software repository holding that conflict, extracts features, and applies the trained classifier to recommend a strategy. When the strategy is KL or KR, RPredictor also outputs the resolved version. To evaluate RPredictor, we conducted large-scale experiments with 74,861 conflicts extracted from the version history of 482 open-source projects. We applied RPredicto to perform both within-project and cross-project prediction tasks. For the within-project setting, in each repository, we used the oldest 90% of resolved conflicts to train RPredictor and the remaining 10% of resolved conflicts for testing. RPredictor predicted resolutions with 63% F-score. For the cross-project setting, we performed 10-fold cross validation. Namely, we divided the 482 software repositories evenly into 10 folds. In each experiment, we leveraged the conflict data in nine folds for training and used the conflict data from the remaining fold for testing. We repeated the experiment 10 times, with each experiment using a different fold for testing. RPredictor recommended resolutions with 46% F-score.

We made the following contributions in this paper: • A novel empirical study of 12 characteristics of 15,758 conflicts to understand their correlation with resolutions KL, KR, or ME. • A novel tool RPredictor, that leverages machine learning (ML) to predict the resolution strategy for a given conflict. • A comprehensive evaluation of RPredictor’s effectiveness with 74,861 conflicts from 482 repositories. • An evaluation of RPredictor’s sensitivity to different ML configurations. • A customizable variant, RPredictor𝑣 , allowing developers to choose more or less conservative results.

This paper was accepted for publication by the Journal of Systems and Software (JSS) in September 2023. Our work is not a secondary study but presents entirely new research findings and innovative contributions that have not been previously reported. The paper has not been presented at, nor is it under consideration for, journal-first programs of other conferences. The first author Waad Aldndni will give the presentation. If accepted, this paper will be the only paper that Waad presents at ASE 2024. Therefore, the acceptance will definitely increase Waad’s opportunity to attend ASE. Additionally, the paper would be ineligible as a journal-first presentation at the next SE3 conference (ICSE/FSE/ASE) because its acceptance date is likely to precede the next ASE’s window of journal acceptance dates, and JSS papers are not eligible to present at either ICSE or FSE. The paper can be accessed at https://doi.org/10.1016/j.jss.2023.111836.

Waad riadh aldndni

Virginia Tech, Blacksburg,VA,U.S.A.

Na Meng

Virginia Tech

United States

Francisco Servant

ITIS Software, University of Malaga

Spain