Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem
Python is the top popular programming language used in the open-source community, largely owing to the extensive support from diverse third-party libraries within the PyPI ecosystem. Nevertheless, the utilization of third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors. Moreover, endeavors have been made to automatically infer dependencies. These approaches focus on version-level checks and inference, based on the assumption that configurations of libraries in the PyPI ecosystem are correct. However, our study reveals that this assumption is not universally valid, and relying solely on version-level checks proves inadequate in ensuring compatible run-time environments.
In this paper, we conduct an empirical study to comprehensively study the configuration issues in the PyPI ecosystem. Specifically, we propose PyCon, a source-level detector, for detecting potential configuration issues. PyCon employs three distinct checks, targeting the setup, packing, and usage stages of libraries, respectively. To evaluate the effectiveness of the current automatic dependency inference approaches, we build a benchmark called VLibs, comprising library releases that pass all three checks of PyCon. We identify 15 kinds of configuration issues and find that 183,864 library releases suffer from potential configuration issues. Remarkably, 68% of these issues can only be detected via the source-level check. Our experiment results show that the most advanced automatic dependency inference approach, PyEGo, can successfully infer dependencies for only 65% of library releases. The primary failures stem from dependency conflicts and the absence of required libraries in the generated configurations. Based on the empirical results, we derive six findings and draw two implications for open-source developers and future research in automatic dependency inference.
Thu 18 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Analytics 3Research Track / Journal-first Papers / Demonstrations at Maria Helena Vieira da Silva Chair(s): Sridhar Chimalakonda Indian Institute of Technology, Tirupati | ||
14:00 15mTalk | Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem Research Track Yun Peng The Chinese University of Hong Kong, Ruida Hu Harbin Institute of Technology, Shenzhen, Ruoke Wang Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Shuqing Li The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong | ||
14:15 15mTalk | Data-Driven Evidence-Based Syntactic Sugar Design Research Track David OBrien Iowa State University, Robert Dyer University of Nebraska-Lincoln, Tien N. Nguyen University of Texas at Dallas, Hridesh Rajan Iowa State University | ||
14:30 15mTalk | Revisiting Android App Categorization Research Track Marco Alecci University of Luxembourg, Jordan Samhi CISPA Helmholtz Center for Information Security, Tegawendé F. Bissyandé University of Luxembourg, Jacques Klein University of Luxembourg | ||
14:45 15mTalk | Are Your Requests Your True Needs? Checking Excessive Data Collection in VPA App Research Track Fuman Xie University of Queensland, Chuan Yan University of Queensland, Mark Huasong Meng National University of Singapore, Shaoming Teng The University of Queensland, Yanjun Zhang Deakin University, Guangdong Bai University of Queensland | ||
15:00 7mTalk | Acrobats and Safety-Nets: Problematizing Large-Scale Agile Software Development Journal-first Papers Knut Rolland University of Oslo, Brian Fitzgerald Lero - The Irish Software Research Centre and University of Limerick, Torgeir Dingsøyr Norwegian University of Science and Technology and SimulaMet, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital Link to publication DOI | ||
15:07 7mTalk | Program Transformation Landscapes for Automated Program Modification Using Gin: Extended Abstract Journal-first Papers Justyna Petke University College London, Brad Alexander University of Adelaide, Earl T. Barr University College London, Alexander E.I. Brownlee University of Stirling, Markus Wagner Monash University, Australia, David R. White University of Sheffield | ||
15:14 7mTalk | Boidae: Your Personal Mining Platform Demonstrations Brian Sigurdson Bowling Green State University, Samuel W. Flint University of Nebraska-Lincoln, Robert Dyer University of Nebraska-Lincoln Pre-print Media Attached | ||
15:21 7mTalk | Code Mapper: Mapping the Global Contributions of OSS Demonstrations Thomas Le Tourneau CY Tech, Jasmine Latendresse Concordia University, Ahmad Abdellatif University of Calgary, Emad Shihab Concordia University |