NLP Libraries, Energy Consumption and Runtime - An Empirical Study
In the realm of natural language processing (NLP), the rising computational demands of modern models bring energy efficiency to the forefront of sustainable computing. Pre-processing tasks, such as tokenization, stemming, and POS tagging, are critical steps in transforming raw text into structured formats suitable for machine learning models. However, despite their widespread use in numerous NLP pipelines, little attention has been given to their energy consumption. Motivation: The increasing adoption of resource-intensive NLP models like LLMs and deep learning frameworks emphasizes the importance of optimizing every phase of the NLP pipeline, including pre-processing, which is frequently overlooked in energy studies. Analyzing pre-processing energy consumption is crucial for achieving a more sustainable and eco-friendly NLP ecosystem. Objective: This empirical study aims to evaluate and compare the energy consumption and runtime performance of three popular NLP libraries—NLTK, spaCy, and Gensim—across six common pre-processing tasks. Methodology: We conducted a comprehensive comparison using three distinct datasets and six pre-processing tasks. Energy consumption was measured using the Intel-RAPL and NVIDIA-SMI interfaces, while runtime performance was recorded across all library-task combinations. Results: The results reveal substantial discrepancies in energy consumption across the three libraries, with up to 93% of cases exhibiting significant variations. Gensim showed superior efficiency in tokenization and stemming, while spaCy excelled in tasks like POS tagging and Named Entity Recognition (NER). These findings underscore the potential for optimizing NLP pre-processing tasks for energy efficiency. Conclusion: Our study highlights the untapped potential for improving energy efficiency in NLP pipelines. These insights emphasize the need for more focused research into energy-efficient NLP techniques, especially in the pre-processing phase, to support the development of greener, more sustainable computational models.
Tue 24 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
16:00 - 17:40 | Fairness and GreenJournal First / Research Papers / Demonstrations at Aurora A Chair(s): Aldeida Aleti Monash University | ||
16:00 10mTalk | MANILA: A Low-Code Application to Benchmark Machine Learning Models and Fairness-Enhancing Methods Demonstrations Giordano d'Aloisio University of L'Aquila Pre-print Media Attached | ||
16:10 20mTalk | Fairness Testing of Machine Translation Systems Journal First Zeyu Sun Institute of Software, Chinese Academy of Sciences, Zhenpeng Chen Nanyang Technological University, Jie M. Zhang King's College London, Dan Hao Peking University | ||
16:30 20mTalk | Bias behind the Wheel: Fairness Testing of Autonomous Driving Systems Journal First Xinyue Li Peking University, Zhenpeng Chen Nanyang Technological University, Jie M. Zhang King's College London, Federica Sarro University College London, Ying Zhang Peking University, Xuanzhe Liu Peking University | ||
16:50 10mTalk | FAMLEM, the FAst ModuLar Energy Meter at Code Level Demonstrations Max Weber Leipzig University, Johannes Dorn Leipzig University, Sven Apel Saarland University, Norbert Siegmund Leipzig University | ||
17:00 20mTalk | NLP Libraries, Energy Consumption and Runtime - An Empirical Study Research Papers Rajrupa Chattaraj Indian Institute of Technology Tirupati, India, Sridhar Chimalakonda Indian Institute of Technology Tirupati DOI | ||
17:20 20mTalk | An adaptive language-agnostic pruning method for greener language models for code Research Papers Mootez Saad Dalhousie University, José Antonio Hernández López Linköping University, Boqi Chen McGill University, Daniel Varro Linköping University / McGill University, Tushar Sharma Dalhousie University DOI Pre-print |
Aurora A is the first room in the Aurora wing.
When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.