A Dataset of Bot and Human Activities in GitHub
Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort-intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-art bot identification tools have been developed to detect bots based on their comments in commits, issues and pull requests. Given that bots can be involved in many other activity types, there is a need to consider more activities that they are carrying out in the software repositories they are involved in. We, therefore, propose a curated dataset of such activities carried out by bots and humans involved in GitHub repositories. The dataset was constructed by identifying 24 high-level activity types that could be extracted from 15 lower-level GitHub event types that were queried from GitHub’s event stream API for all considered bots and humans. The proposed dataset contains around 600K activities performed by 384 bots and 585 humans involved in GitHub repositories, during an observation period ranging from 25 November 2022 to 25 January 2023. This dataset is valuable for future empirical studies focusing on how bots impact the software development process.
Tue 16 MayDisplayed time zone: Hobart change
11:50 - 12:35 | Development Tools & Practices IIData and Tool Showcase Track / Industry Track / Technical Papers / Registered Reports at Meeting Room 109 Chair(s): Banani Roy University of Saskatchewan | ||
11:50 12mTalk | Automating Arduino Programming: From Hardware Setups to Sample Source Code Generation Technical Papers Imam Nur Bani Yusuf Singapore Management University, Singapore, Diyanah Binte Abdul Jamal Singapore Management University, Lingxiao Jiang Singapore Management University Pre-print | ||
12:02 6mTalk | A Dataset of Bot and Human Activities in GitHub Data and Tool Showcase Track Natarajan Chidambaram University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons | ||
12:08 6mTalk | Mining the Characteristics of Jupyter Notebooks in Data Science Projects Registered Reports Morakot Choetkiertikul Mahidol University, Thailand, Apirak Hoonlor Mahidol University, Chaiyong Ragkhitwetsagul Mahidol University, Thailand, Siripen Pongpaichet Mahidol University, Thanwadee Sunetnanta Mahidol University, Tasha Settewong Mahidol University, Raula Gaikovina Kula Nara Institute of Science and Technology | ||
12:14 6mTalk | Optimizing Duplicate Size Thresholds in IDEs Industry Track Konstantin Grotov JetBrains Research, Constructor University, Sergey Titov JetBrains Research, Alexandr Suhinin JetBrains, Yaroslav Golubev JetBrains Research, Timofey Bryksin JetBrains Research Pre-print | ||
12:20 12mTalk | Boosting Just-in-Time Defect Prediction with Specific Features of C Programming Languages in Code Changes Technical Papers Chao Ni Zhejiang University, xiaodanxu College of Computer Science and Technology, Zhejiang university, Kaiwen Yang Zhejiang University, David Lo Singapore Management University |