Towards Understanding Performance Bugs in Popular Data Science Libraries
With the increasing demand for handling large-scale and complex data, data science (DS) applications often suffer from long execution time and rapid RAM exhaustion, which leads to many serious issues like unbearable delays and crashes in financial transactions. As popular DS libraries are frequently used in these applications, their performance bugs (PBs) are a major contributing factor to these issues, making it crucial to address them for improving overall application performance. However, PBs in popular DS libraries remain largely unexplored. To address this gap, we conducted a study of 138 PBs collected from seven popular DS libraries. Our study examined the impact of PBs and proposed a taxonomy for common root causes. We found over half of the PBs arise from inefficient data processing operations, especially within data structure. We also explored the effort required to locate their root causes and fix these bugs, along with the challenges involved. Notably, 28% of the PBs could be fixed using simple strategies (e.g. Conditions Optimizing), suggesting the potential for automated repair approaches. Our findings highlight the severity of PBs in core DS libraries and offer insights for developing high-performance libraries and detecting PBs. Furthermore, we derived test rules from our identified root causes, identifying eight PBs, of which four were confirmed, demonstrating the practical utility of our findings.
Mon 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:30 - 12:30 | PerformanceDemonstrations / Research Papers / Ideas, Visions and Reflections / Journal First / Industry Papers at Vega Chair(s): Philipp Leitner Chalmers | University of Gothenburg | ||
10:30 20mTalk | Accuracy Can Lie: On the Impact of Surrogate Model in Configuration Tuning Journal First Pengzhou Chen University of electronic science and technology of China, Jingzhi Gong University of Leeds, Tao Chen University of Birmingham | ||
10:50 20mTalk | Understanding Debugging as Episodes: A Case Study on Performance Bugs in Configurable Software Systems Research Papers Max Weber Leipzig University, Alina Mailach Leipzig University, Sven Apel Saarland University, Janet Siegmund Chemnitz University of Technology, Raimund Dachselt Technical University of Dresden, Norbert Siegmund Leipzig University DOI | ||
11:10 20mTalk | Towards Understanding Performance Bugs in Popular Data Science Libraries Research Papers Haowen Yang The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Zhengda Li The Chinese University of Hong Kong, Shenzhen, Zhiqing Zhong The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Xiaoying Tang hinese University of Hong Kong, Shenzhen, Pinjia He Chinese University of Hong Kong, Shenzhen DOI | ||
11:30 20mTalk | When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications Industry Papers Sören Henning Dynatrace Research, Adriano Vogel , Esteban Pérez Wohlfeil Dynatrace Research, Otmar Ertl Dynatrace Research, Rick Rabiser LIT CPS, Johannes Kepler University Linz DOI Pre-print | ||
11:50 10mTalk | LitmusKt: Concurrency Stress Testing for Kotlin Demonstrations Denis Lochmelis Constructor University Bremen, JetBrains Research, Evgenii Moiseenko JetBrains Research, Yaroslav Golubev JetBrains Research, Anton Podkopaev JetBrains Research, Constructor University DOI Pre-print | ||
12:00 10mTalk | Breaking the Loop: AWARE is the New MAPE-K Ideas, Visions and Reflections | ||
12:10 20mTalk | COFFE: A Code Efficiency Benchmark for Code Generation Research Papers Yun Peng The Chinese University of Hong Kong, Jun Wan Zhejiang University, Yichen LI The Chinese University of Hong Kong, Xiaoxue Ren Zhejiang University DOI |
Vega is close to the registration desk.
Facing the registration desk, its entrance is on the left, close to the hotel side entrance.