Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning
In the GitHub ecosystem, workflows are used as an effective means to automate development tasks and to set up a Continuous Integration and Delivery (CI/CD pipeline). GitHub Actions (GHA) have been conceived to provide developers with a practical tool to create and maintain workflows, avoiding “reinventing the wheel” and cluttering the workflow with shell commands. Properly leveraging the power of GitHub Actions can facilitate the development processes, enhance collaboration, and significantly impact project outcomes. To expose actions to search engines, GitHub allows developers to assign them to one or more categories manually. These are used as an effective means to group actions sharing similar functionality. Nevertheless, while providing a practical way to execute workflows, many actions have unclear purposes, and sometimes they are not categorized. In this work, we bridge such a gap by conceptualizing Gavel, a practical solution to increasing the visibility of actions in GitHub. By leveraging the content of README.MD files for each action, we use Transformer–a deep learning algorithm–to assign suitable categories to the action. We conducted an empirical investigation and compared Gavel with a state-of-the-art baseline. The experimental results show that our proposed approach can assign categories to GitHub actions effectively, thus outperforming the state-of-the-art baseline.
Thu 24 OctDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
11:00 - 12:35 | Open source software and repository miningESEM Technical Papers / ESEM Emerging Results, Vision and Reflection Papers Track at Multimedia (B3 Building - Hall) Chair(s): Davide Taibi University of Oulu | ||
11:00 20mFull-paper | Sustaining Maintenance Labor for Healthy Open Source Software Projects through Human Infrastructure: A Maintainer Perspective ESEM Technical Papers Johan Linåker RISE Research Institutes of Sweden, Georg Link Bitergia, Kevin Lumbard Creighton University | ||
11:20 20mFull-paper | Documenting Ethical Considerations in Open Source AI Models ESEM Technical Papers Haoyu Gao The University of Melbourne, Mansooreh Zahedi The Univeristy of Melbourne, Christoph Treude Singapore Management University, Sarita Rosenstock the University of Melbourne, Marc Cheong the University of Melbourne Pre-print | ||
11:40 20mFull-paper | An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software ESEM Technical Papers Lucas Franke Virginia Tech, Huayu Liang Virginia Tech, Sahar Farzanehpour Virginia Tech, Aaron Brantly Virginia Tech, James C. Davis Purdue University, Chris Brown Virginia Tech Pre-print | ||
12:00 20mFull-paper | An Empirical Study of API Misuses of Data-Centric Libraries ESEM Technical Papers Akalanka Galappaththi University of Alberta, Sarah Nadi New York University Abu Dhabi, University of Alberta, Christoph Treude Singapore Management University Pre-print | ||
12:20 15mVision and Emerging Results | Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning ESEM Emerging Results, Vision and Reflection Papers Track Phuong T. Nguyen University of L’Aquila, Juri Di Rocco University of L'Aquila, Claudio Di Sipio University of L'Aquila, Mudita Shakya University of L'Aquila, Davide Di Ruscio University of L'Aquila, Massimiliano Di Penta University of Sannio, Italy Pre-print |