ESEIW 2024
Sun 20 - Fri 25 October 2024 Barcelona, Spain

Developers rely on third-party library Application Programming Interfaces (APIs) when developing software. However, libraries typically come with assumptions and API usage constraints, whose violation results in textit{API misuse}. API misuses may result in crashes or incorrect behavior. Even though API misuse is a well-studied area, a recent study of API misuse of deep learning libraries showed that the nature of these misuses and their symptoms are different from misuses of traditional libraries, and as a result highlighted potential shortcomings of current misuse detection tools. We speculate that these observations may not be limited to deep learning API misuses but may stem from the data-centric nature of these APIs. Data-centric libraries often deal with diverse data structures, intricate processing workflows, and a multitude of parameters, which can make them inherently more challenging to use correctly. Therefore, understanding the potential misuses of these libraries is important to avoid unexpected application behavior.
To this end, this paper contributes an empirical study of API misuses of five data-centric libraries that cover areas such as data processing, numerical computation, machine learning, and visualization. We identify misuses of these libraries by analyzing data from both Stack Overflow and GitHub. Our results show that many of the characteristics of API misuses observed for deep learning libraries extend to misuses of the data-centric library APIs we study. We also find that developers tend to misuse APIs from data-centric libraries, regardless of whether the API directive appears in the documentation. Overall, our work exposes the challenges of API misuse in data-centric libraries, rather than only focusing on deep learning libraries. Our collected misuses and their characterization lay groundwork for future research to help reduce misuses of these libraries.

Thu 24 Oct

Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

11:00 - 12:35
11:00
20m
Full-paper
Sustaining Maintenance Labor for Healthy Open Source Software Projects through Human Infrastructure: A Maintainer Perspective
ESEM Technical Papers
Johan Linåker RISE Research Institutes of Sweden, Georg Link Bitergia, Kevin Lumbard Creighton University
11:20
20m
Full-paper
Documenting Ethical Considerations in Open Source AI Models
ESEM Technical Papers
Haoyu Gao The University of Melbourne, Mansooreh Zahedi The Univeristy of Melbourne, Christoph Treude Singapore Management University, Sarita Rosenstock the University of Melbourne, Marc Cheong the University of Melbourne
Pre-print
11:40
20m
Full-paper
An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software
ESEM Technical Papers
Lucas Franke Virginia Tech, Huayu Liang Virginia Tech, Sahar Farzanehpour Virginia Tech, Aaron Brantly Virginia Tech, James C. Davis Purdue University, Chris Brown Virginia Tech
Pre-print
12:00
20m
Full-paper
An Empirical Study of API Misuses of Data-Centric Libraries
ESEM Technical Papers
Akalanka Galappaththi University of Alberta, Sarah Nadi New York University Abu Dhabi, University of Alberta, Christoph Treude Singapore Management University
Pre-print
12:20
15m
Vision and Emerging Results
Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning
ESEM Emerging Results, Vision and Reflection Papers Track
Phuong T. Nguyen University of L’Aquila, Juri Di Rocco University of L'Aquila, Claudio Di Sipio University of L'Aquila, Mudita Shakya University of L'Aquila, Davide Di Ruscio University of L'Aquila, Massimiliano Di Penta University of Sannio, Italy
Pre-print