An Empirical Study of API Misuses of Data-Centric Libraries
Developers rely on third-party library Application Programming Interfaces (APIs) when developing software. However, libraries typically come with assumptions and API usage constraints, whose violation results in textit{API misuse}. API misuses may result in crashes or incorrect behavior. Even though API misuse is a well-studied area, a recent study of API misuse of deep learning libraries showed that the nature of these misuses and their symptoms are different from misuses of traditional libraries, and as a result highlighted potential shortcomings of current misuse detection tools. We speculate that these observations may not be limited to deep learning API misuses but may stem from the data-centric nature of these APIs. Data-centric libraries often deal with diverse data structures, intricate processing workflows, and a multitude of parameters, which can make them inherently more challenging to use correctly. Therefore, understanding the potential misuses of these libraries is important to avoid unexpected application behavior.
To this end, this paper contributes an empirical study of API misuses of five data-centric libraries that cover areas such as data processing, numerical computation, machine learning, and visualization. We identify misuses of these libraries by analyzing data from both Stack Overflow and GitHub. Our results show that many of the characteristics of API misuses observed for deep learning libraries extend to misuses of the data-centric library APIs we study. We also find that developers tend to misuse APIs from data-centric libraries, regardless of whether the API directive appears in the documentation. Overall, our work exposes the challenges of API misuse in data-centric libraries, rather than only focusing on deep learning libraries. Our collected misuses and their characterization lay groundwork for future research to help reduce misuses of these libraries.
Thu 24 OctDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
11:00 - 12:35 | Open source software and repository miningESEM Technical Papers / ESEM Emerging Results, Vision and Reflection Papers Track at Multimedia (B3 Building - Hall) Chair(s): Davide Taibi University of Oulu | ||
11:00 20mFull-paper | Sustaining Maintenance Labor for Healthy Open Source Software Projects through Human Infrastructure: A Maintainer Perspective ESEM Technical Papers Johan Linåker RISE Research Institutes of Sweden, Georg Link Bitergia, Kevin Lumbard Creighton University | ||
11:20 20mFull-paper | Documenting Ethical Considerations in Open Source AI Models ESEM Technical Papers Haoyu Gao The University of Melbourne, Mansooreh Zahedi The Univeristy of Melbourne, Christoph Treude Singapore Management University, Sarita Rosenstock the University of Melbourne, Marc Cheong the University of Melbourne Pre-print | ||
11:40 20mFull-paper | An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software ESEM Technical Papers Lucas Franke Virginia Tech, Huayu Liang Virginia Tech, Sahar Farzanehpour Virginia Tech, Aaron Brantly Virginia Tech, James C. Davis Purdue University, Chris Brown Virginia Tech Pre-print | ||
12:00 20mFull-paper | An Empirical Study of API Misuses of Data-Centric Libraries ESEM Technical Papers Akalanka Galappaththi University of Alberta, Sarah Nadi New York University Abu Dhabi, University of Alberta, Christoph Treude Singapore Management University Pre-print | ||
12:20 15mVision and Emerging Results | Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning ESEM Emerging Results, Vision and Reflection Papers Track Phuong T. Nguyen University of L’Aquila, Juri Di Rocco University of L'Aquila, Claudio Di Sipio University of L'Aquila, Mudita Shakya University of L'Aquila, Davide Di Ruscio University of L'Aquila, Massimiliano Di Penta University of Sannio, Italy Pre-print |