EarlyPR: Early Prediction of Potential Pull-Requests from Forks
Abstract—In this work, we propose the EarlyPR framework that identifies and predicts potential pull-request (PR) contributions from an open source software (OSS) project’s forks, which can potentially improve the efficiency of the fork-and-pull based development in OSS projects by supporting early warning of duplicated and rejected contributions, and detection of lost contributions. Unlike traditional, PR-based studies that rely on the descriptions and contents of PRs provided by their creators, which are only available after the PRs are created, EarlyPR makes predictions before the creation of PRs by mining the forks’ commit history. EarlyPR’s task is challenging because of the explosive number of commit subsets in a fork’s commit history that may form PRs, and the absence of resulting, real PR-related information. To tackle the challenges, we adopt the state-of-the art, Transformer-based architecture to extract rich statistical and content information from the forks and their commits to support the prediction of potential PR contributions. And to make the algorithms scalable, we devise a TemporalFilter to find candidate PRs by mimicking the real-world processes of picking subsets of commits from a fork’s commit history when creating PRs. Experimental results on real-world OSS project data suggest that EarlyPR is effective in predicting PRs, which are essentially sets of commits selected from forks to compose these PRs. Experimental results obtained using real-world OSS projects’ and their forks’ data suggest that EarlyPR is effective by achieving a hitting rate of 0.790 and a missing rate of 0.367 by matching the predicted and real PRs under a stringent criterion of IoU > 0.5. We further demonstrate that we can forecast the merging of PRs based on EarlyPR’s predictions with an accuracy of 70.8%. In summary, the proposed approach can potentially improve the efficiency of the fork-and-pull based OSS development by making accurate and early predictions of PR contributions from the distributed, and often independently, developed forks.
Fri 7 MarDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | Change Management & Program ComprehensionReproducibility Studies and Negative Results (RENE) Track / Research Papers / Early Research Achievement (ERA) Track at L-1710 Chair(s): Masud Rahman Dalhousie University | ||
11:00 15mTalk | AdvFusion: Adapter-based Knowledge Transfer for Code Summarization on Code Language Models Research Papers Iman Saberi University of British Columbia Okanagan, Amirreza Esmaeili University of British Columbia, Fatemeh Hendijani Fard University of British Columbia, Chen Fuxiang University of Leicester | ||
11:15 15mTalk | EarlyPR: Early Prediction of Potential Pull-Requests from Forks Research Papers | ||
11:30 15mTalk | The Hidden Challenges of Merging: A Tool-Based Exploration Research Papers Luciana Gomes UFCG, Melina Mongiovi Federal University of Campina Grande, Brazil, Sabrina Souto UEPB, Everton L. G. Alves Federal University of Campina Grande | ||
11:45 7mTalk | On the Performance of Large Language Models for Code Change Intent Classification Early Research Achievement (ERA) Track Issam Oukay Department of Software and IT Engineering, ETS Montreal, University of Quebec, Montreal, Canada, Moataz Chouchen Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada, Ali Ouni ETS Montreal, University of Quebec, Fatemeh Hendijani Fard University of British Columbia | ||
11:52 15mTalk | Revisiting Method-Level Change Prediction: Comparative Evaluation at Different Granularities Reproducibility Studies and Negative Results (RENE) Track Hiroto Sugimori School of Computing, Institute of Science Tokyo, Shinpei Hayashi Institute of Science Tokyo DOI Pre-print |