An Empirical Study on the Energy Usage and Performance of Pandas and Polars Data Analysis Python Libraries (EASE 2024 - Research Papers)

Who

Felix Nahrstedt, Mehdi Karmouche, Karolina Bargieł, Pouyeh Banijamali, Apoorva Nalini Pradeep Kumar, Ivano Malavolta

Track

EASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Jun 2024 15:05 - 15:20 at Room Capri - Program Comprehension Chair(s): Nicole Novielli

Abstract

Context. Python’s growing popularity in data analysis and the contemporary emphasis on energy-efficient software tools necessitate an investigation into the energy implications of data operations, particularly in resource-intensive domains like data science. This study provides fundamental insights for library selection, focusing on Pandas, a widely-used Python data manipulation library, and Polars, a Rust-based library known for its performance.

Goal. We aim to compare and analyze the energy usage of Polars and Pandas. The study aims to provide insights for developers and data scientists by identifying scenarios where one library outperforms the other in terms of energy usage while exploring the possible correlations between energy usage and performance metrics.

Method. We performed four separate experiment blocks including 8 Data Analysis Tasks (DATs) from an official TPCH Benchmark done by Polars and 6 Synthetic DATs. Both DATs groups are run with small and large dataframes and for both libraries.

Results. Polars is more energy-efficient than Pandas when dealing with large dataframes. For small dataframes, the TPCH Benchmarking DATs does not show a statistically significant difference, while for the Synthetic DATs, Polars performs significantly better. We identified strong positive correlations between energy usage and execution time, as well as memory usage for Pandas, while Polars did not show significant memory usage correlations for the majority of runs. Additionally, there was a significantly negative correlation between energy usage and CPU usage for Pandas.

Conclusions. The study recommends using Polars for energy-efficient and fast data analysis, emphasizing the importance of CPU core utilization in library selection.

Link to Preprint

https://www.ivanomalavolta.com/files/papers/EASE_2024.pdf

Felix Nahrstedt

Vrije Universiteit Amsterdam

Netherlands

Mehdi Karmouche

Vrije Universiteit Amsterdam

Netherlands

Karolina Bargieł

Vrije Universiteit Amsterdam

Netherlands

Pouyeh Banijamali

Vrije Universiteit Amsterdam

Netherlands

Apoorva Nalini Pradeep Kumar

Vrije Universiteit Amsterdam

Netherlands

Ivano Malavolta

Vrije Universiteit Amsterdam

Netherlands

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:20	Program ComprehensionResearch Papers / Short Papers, Vision and Emerging Results at Room Capri Chair(s): Nicole Novielli University of Bari

14:00 15m Talk		Adversarial Attack and Robustness Improvement on Code Summarization Research Papers Xi Ding Sun Yat-Sen University, Yuan Huang Sun Yat-sen University, Xiangping Chen Sun Yat-Sen University, Jing Bian Sun Yat-Sen University
14:15 15m Talk		Understanding Logical Expressions with Negations: Its Complicated Research Papers Aviad Baron Hebrew University, Ilai Granot Hebrew University, Ron Yosef Hebrew University, Dror Feitelson Hebrew University
14:30 15m Talk		A Quantitative Investigation of Trends in Confusing Variable Pairs Through Commits: Do Confusing Variable Pairs Survive? Research Papers Hirohisa Aman Ehime University, Sousuke Amasaki Okayama Prefectural University, Tomoyuki Yokogawa Okayama Prefectural University, Minoru Kawahara Ehime University
14:45 10m Talk		When simplicity meets effectiveness: Detecting code comments coherence with word embeddings and LSTM Short Papers, Vision and Emerging Results Michael Dubem Igbomezie University of L'Aquila, Phuong T. Nguyen University of L’Aquila, Davide Di Ruscio University of L'Aquila Pre-print
14:55 10m Talk		Exploring Influence of Feature Toggles on Code Complexity Short Papers, Vision and Emerging Results Md Tajmilur Rahman Gannon University, Imran Shalabi Gannon University, Tushar Sharma Dalhousie University
15:05 15m Talk		An Empirical Study on the Energy Usage and Performance of Pandas and Polars Data Analysis Python Libraries Research Papers Felix Nahrstedt Vrije Universiteit Amsterdam, Mehdi Karmouche Vrije Universiteit Amsterdam, Karolina Bargieł Vrije Universiteit Amsterdam, Pouyeh Banijamali Vrije Universiteit Amsterdam, Apoorva Nalini Pradeep Kumar Vrije Universiteit Amsterdam, Ivano Malavolta Vrije Universiteit Amsterdam Pre-print