Automatic Prediction of Developers' Resolutions for Software Merge Conflicts
In collaborative software development, developers simultaneously work in parallel on different branches that they merge periodically. When edits from different branches textually overlap, conflicts may occur. Manually resolving conflicts can be tedious and error-prone. Researchers proposed tool support for conflict resolution, but these tools barely consider developers’ preferences. Conflicts can be resolved by: keeping the local version only (KL), keeping the remote version only (KR), or manually editing them (ME). Recent studies show that developers resolved the majority of textual conflicts by KL or KR. Thus, we created a machine learning-based approach RPredictor to predict developers’ resolution strategy (KL, KR, or ME) given a merge conflict.
To explore the feasibility of creating a predictor for conflict resolution strategies, we first conducted an empirical study to characterize the conflicts in software version history that got resolved with different strategies. We gathered 15,758 conflicts from 100 open-source software repositories, and studied 12 features to characterize each conflict from different perspectives. Our statistical analysis shows a strong correlation between these features and developers’ resolution decisions, indicating a strong potential for successfully building a resolution predictor.
Leveraging the 12 features revealed by our study, we designed and implemented an approach, RPredictor, to automatically predict resolution strategies. RPredictor operates in two phases: training and testing. In the training phase, RPredictor extracts features for each conflict in a set of merge conflicts that were already resolved in the past, and trains a three-class random forest (RF) classifier. In the testing phase, it takes in any new conflict together with the software repository holding that conflict, extracts features, and applies the trained classifier to recommend a strategy. When the strategy is KL or KR, RPredictor also outputs the resolved version. To evaluate RPredictor, we conducted large-scale experiments with 74,861 conflicts extracted from the version history of 482 open-source projects. We applied RPredicto to perform both within-project and cross-project prediction tasks. For the within-project setting, in each repository, we used the oldest 90% of resolved conflicts to train RPredictor and the remaining 10% of resolved conflicts for testing. RPredictor predicted resolutions with 63% F-score. For the cross-project setting, we performed 10-fold cross validation. Namely, we divided the 482 software repositories evenly into 10 folds. In each experiment, we leveraged the conflict data in nine folds for training and used the conflict data from the remaining fold for testing. We repeated the experiment 10 times, with each experiment using a different fold for testing. RPredictor recommended resolutions with 46% F-score.
We made the following contributions in this paper: • A novel empirical study of 12 characteristics of 15,758 conflicts to understand their correlation with resolutions KL, KR, or ME. • A novel tool RPredictor, that leverages machine learning (ML) to predict the resolution strategy for a given conflict. • A comprehensive evaluation of RPredictor’s effectiveness with 74,861 conflicts from 482 repositories. • An evaluation of RPredictor’s sensitivity to different ML configurations. • A customizable variant, RPredictor𝑣 , allowing developers to choose more or less conservative results.
This paper was accepted for publication by the Journal of Systems and Software (JSS) in September 2023. Our work is not a secondary study but presents entirely new research findings and innovative contributions that have not been previously reported. The paper has not been presented at, nor is it under consideration for, journal-first programs of other conferences. The first author Waad Aldndni will give the presentation. If accepted, this paper will be the only paper that Waad presents at ASE 2024. Therefore, the acceptance will definitely increase Waad’s opportunity to attend ASE. Additionally, the paper would be ineligible as a journal-first presentation at the next SE3 conference (ICSE/FSE/ASE) because its acceptance date is likely to precede the next ASE’s window of journal acceptance dates, and JSS papers are not eligible to present at either ICSE or FSE. The paper can be accessed at https://doi.org/10.1016/j.jss.2023.111836.
Thu 31 OctDisplayed time zone: Pacific Time (US & Canada) change
13:30 - 15:00 | Software MergeResearch Papers / Journal-first Papers at Gardenia Chair(s): Haiyan Zhao Peking University | ||
13:30 15mTalk | Evaluation of Version Control Merge Tools Research Papers Benedikt Schesch ETH Zurich, Ryan Featherman UW CSE, Ben Roberts UW CSE, Kenneth J Yang UW CSE, Michael D. Ernst University of Washington | ||
13:45 15mTalk | Semistructured Merge with Language-Specific Syntactic Separators Research Papers Guilherme Cavalcanti Federal Institute of Pernambuco, Brazil, Paulo Borba Federal University of Pernambuco, Leonardo dos Anjos Federal University of Pernambuco, Jonatas Clementino Federal University of Pernambuco | ||
14:00 15mTalk | Automatic Prediction of Developers' Resolutions for Software Merge Conflicts Journal-first Papers Waad riadh aldndni Virginia Tech, Blacksburg,VA,U.S.A., Na Meng Virginia Tech, Francisco Servant ITIS Software, University of Malaga | ||
14:15 15mTalk | ConflictBench: A Benchmark to Evaluate Software Merge Tools Journal-first Papers | ||
14:30 15mTalk | Revisiting the Conflict-Resolving Problem from a Semantic Perspective Research Papers Jinhao Dong Peking University, Jun Sun Singapore Management University, Yun Lin Shanghai Jiao Tong University, Yedi Zhang National University of Singapore, Murong Ma National University of Singapore, Jin Song Dong National University of Singapore, Dan Hao Peking University |