SANER 2025
Tue 4 - Fri 7 March 2025 Montréal, Québec, Canada

GitHub Actions (GHA) has become among the most popular Continuous Integration (CI) platforms in open-source software (OSS) and commercial projects. Collecting such build data remains of crucial importance for practitioners and researchers to allow build performance monitoring, optimization and improvement. However, mining GHA builds to collect build-related data and metrics remains challenging and time-consuming. In this paper, we introduce GHAminer, an open-source tool designed to collect build-related metrics for GitHub Actions. GHAminer covers various aspects of data such as the build-related code changes and tests, the build duration and status (e.g., passed, failed, timeout, etc.), and repository metadata, which would be useful for practitioners and researchers to make data-driven decisions to enhance CI efficiency and quality. The tool has a modular architecture that supports efficient data extraction with minimal API load. Specifically, it consists of a set of modules that are related to repository information collection, build analysis, commit history analysis, and build log parsing. We evaluate the performance of GHAminer on a representative sample of 1,151OSS projects. Results show that GHAminer is efficient in handling projects of various sizes with relatively stable performance to collect build data for larger projects. GHAminer is publicly available with a demo video at: https://github.com/stilab-ets/GHAminer