What Do Users Ask in Open-Source AI Repositories? An Empirical Study of GitHub Issues (MSR 2023 - Technical Papers)

Who

Zhou Yang, Chenyu Wang, Jieke Shi, Thong Hoang, Pavneet Singh Kochhar, Qinghua Lu, Zhenchang Xing, David Lo

Track

MSR 2023 Technical Papers

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 May 2023 12:02 - 12:14 at Meeting Room 109 - Documentation + Q&A I Chair(s): Ahmad Abdellatif

Abstract

Artiﬁcial intelligence (AI) systems, which beneﬁt from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these AI systems are inseparable from open-source software (OSS). Specifically, many benchmarks, implementations, and frameworks for constructing AI systems are made open source and accessible to the general public, allowing researchers and practitioners to reproduce the reported results and broaden the application of AI systems. The development of AI systems follows a data-driven paradigm and is sensitive to hyperparameter settings and data separation. Developers may encounter unique problems when employing open-source AI repositories.

This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we ﬁnd 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Speciﬁcally, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also ﬁnd that half of these issues resolve within four days. Moreover, issue management features, i.e., label and assign, are not widely adopted in the open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our ﬁndings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality.

Zhou Yang

Singapore Management University

Singapore

Chenyu Wang

Singapore Management University

Singapore

Jieke Shi

Singapore Management University

Singapore

Thong Hoang

CSIRO's Data61

Pavneet Singh Kochhar

Microsoft

Canada

Qinghua Lu

CSIRO’s Data61

Australia

Zhenchang Xing

David Lo

Singapore Management University

Singapore

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 15 May
Displayed time zone: Hobart change

11:50 - 12:35	Documentation + Q&A IData and Tool Showcase Track / Technical Papers at Meeting Room 109 Chair(s): Ahmad Abdellatif Concordia University

11:50 12m Talk		Evaluating Software Documentation Quality Technical Papers Henry Tang University of Alberta, Sarah Nadi University of Alberta
12:02 12m Talk		What Do Users Ask in Open-Source AI Repositories? An Empirical Study of GitHub Issues Technical Papers Zhou Yang Singapore Management University, Chenyu Wang Singapore Management University, Jieke Shi Singapore Management University, Thong Hoang CSIRO's Data61, Pavneet Singh Kochhar Microsoft, Qinghua Lu CSIRO’s Data61, Zhenchang Xing , David Lo Singapore Management University
12:14 12m Talk		PICASO: Enhancing API Recommendations with Relevant Stack Overflow Posts Technical Papers Ivana Clairine Irsan Singapore Management University, Ting Zhang Singapore Management University, Ferdian Thung Singapore Management University, Kisub Kim Singapore Management University, David Lo Singapore Management University
12:26 6m Talk		GIRT-Data: Sampling GitHub Issue Report Templates Data and Tool Showcase Track Nafiseh Nikehgbal Sharif University of Technology, Amir Hossein Kargaran LMU Munich, Abbas Heydarnoori Bowling Green State University, Hinrich Schütze LMU Munich Pre-print