Exploring Activity and Contributors on GitHub: Who, What, When, and Where
Apart from being a code hosting platform, GitHub is the place where large-scale open collaborations and contributions happen. Every minute, thousands of developers are submitting code, having discussions of issues or pull requests, with all user behaviors recorded in the GitHub Event Stream (GES). Exploration of the activities in the GES could help understand who is active, the way they work, the time when they are active and even their location. To this end, a large-scale analysis was initially performed based on the 0.86 billion event records generated in 2020. We extracted 902K active contributors out of 14 million GitHub accounts by observing their activity distribution, then explored their behavior distribution, active time in the day and week, and estimated time zone distributions on the basis of their circadian activity rhythm. To go deeper, a case study of 79 projects in CNCF and contrast analyses of different project maturity levels were conducted. Our results showed that from a macro perspective, bots are increasingly more active and can serve numerous projects. Contributors work on weekdays, and are globally more inclined toward the daytime working hours in the Americas and Europe. The time zone distribution also reveals that UTC+2 and UTC-4 have the most active contributors. A critical discovery was the validation and quantification of a high bus factor risk exists in the OSS ecosystem. Whether from a large group point of view or within specific projects, a rather small group of OSS contributors (less than 20%) undertook the majority of the work. The GES can provide a wealth of information about open source software (OSS). Our findings provide insights into global GitHub collaboration behaviors and may be of help for researchers and practitioners to further understand modern OSS ecosystem.
Thu 8 DecDisplayed time zone: Osaka, Sapporo, Tokyo change
15:00 - 16:30 | Empirical Studies 2Technical Track at Room2 Chair(s): Yusuf Sulistyo Nugroho Universitas Muhammadiyah Surakarta | ||
15:00 20mPaper | Exploring Activity and Contributors on GitHub: Who, What, When, and Where Technical Track Xiaoya Xia East China Normal University, Zhenjie Weng East China Normal University, will wang , Shengyu Zhao Tongji University | ||
15:20 20mPaper | The Language of Programming: On the Vocabulary of Names Technical Track | ||
15:40 20mPaper | An Empirical Study of Predicting Fault-prone Components and their Evolution Technical Track | ||
16:00 20mPaper | Empirical Study of Co-Renamed Identifiers Technical Track Yuki Osumi Tokyo Institute of Technology, Naotaka Umekawa Tokyo Institute of Technology, Hitomi Komata Tokyo Institute of Technology, Shinpei Hayashi Tokyo Institute of Technology DOI Pre-print |