NLP Libraries, Energy Consumption and Runtime - An Empirical Study (FSE 2025 - Research Papers)

Mon 23 - Fri 27 June 2025 Trondheim, Norway

Who

Rajrupa Chattaraj, Sridhar Chimalakonda

Track

FSE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 24 Jun 2025 17:00 - 17:20 at Aurora A - Fairness and Green Chair(s): Aldeida Aleti

Abstract

In the realm of natural language processing (NLP), the rising computational demands of modern models bring energy efficiency to the forefront of sustainable computing. Pre-processing tasks, such as tokenization, stemming, and POS tagging, are critical steps in transforming raw text into structured formats suitable for machine learning models. However, despite their widespread use in numerous NLP pipelines, little attention has been given to their energy consumption. Motivation: The increasing adoption of resource-intensive NLP models like LLMs and deep learning frameworks emphasizes the importance of optimizing every phase of the NLP pipeline, including pre-processing, which is frequently overlooked in energy studies. Analyzing pre-processing energy consumption is crucial for achieving a more sustainable and eco-friendly NLP ecosystem. Objective: This empirical study aims to evaluate and compare the energy consumption and runtime performance of three popular NLP libraries—NLTK, spaCy, and Gensim—across six common pre-processing tasks. Methodology: We conducted a comprehensive comparison using three distinct datasets and six pre-processing tasks. Energy consumption was measured using the Intel-RAPL and NVIDIA-SMI interfaces, while runtime performance was recorded across all library-task combinations. Results: The results reveal substantial discrepancies in energy consumption across the three libraries, with up to 93% of cases exhibiting significant variations. Gensim showed superior efficiency in tokenization and stemming, while spaCy excelled in tasks like POS tagging and Named Entity Recognition (NER). These findings underscore the potential for optimizing NLP pre-processing tasks for energy efficiency. Conclusion: Our study highlights the untapped potential for improving energy efficiency in NLP pipelines. These insights emphasize the need for more focused research into energy-efficient NLP techniques, especially in the pre-processing phase, to support the development of greener, more sustainable computational models.

DOI

https://doi.org/10.1145/3729396

Rajrupa Chattaraj

Indian Institute of Technology Tirupati, India

India

Sridhar Chimalakonda

Indian Institute of Technology Tirupati

India

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 24 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 17:40	Fairness and GreenJournal First / Research Papers / Demonstrations at Aurora A Chair(s): Aldeida Aleti Monash University

16:00 10m Talk		MANILA: A Low-Code Application to Benchmark Machine Learning Models and Fairness-Enhancing Methods Demonstrations Giordano d'Aloisio University of L'Aquila Pre-print Media Attached
16:10 20m Talk		Fairness Testing of Machine Translation Systems Journal First Zeyu Sun Institute of Software, Chinese Academy of Sciences, Zhenpeng Chen Nanyang Technological University, Jie M. Zhang King's College London, Dan Hao Peking University
16:30 20m Talk		Bias behind the Wheel: Fairness Testing of Autonomous Driving Systems Journal First Xinyue Li Peking University, Zhenpeng Chen Nanyang Technological University, Jie M. Zhang King's College London, Federica Sarro University College London, Ying Zhang Peking University, Xuanzhe Liu Peking University
16:50 10m Talk		FAMLEM, the FAst ModuLar Energy Meter at Code Level Demonstrations Max Weber Leipzig University, Johannes Dorn Leipzig University, Sven Apel Saarland University, Norbert Siegmund Leipzig University
17:00 20m Talk		NLP Libraries, Energy Consumption and Runtime - An Empirical Study Research Papers Rajrupa Chattaraj Indian Institute of Technology Tirupati, India, Sridhar Chimalakonda Indian Institute of Technology Tirupati DOI
17:20 20m Talk		An adaptive language-agnostic pruning method for greener language models for code Research Papers Mootez Saad Dalhousie University, José Antonio Hernández López Linköping University, Boqi Chen McGill University, Daniel Varro Linköping University / McGill University, Tushar Sharma Dalhousie University DOI Pre-print

Information for Participants

Tue 24 Jun 2025 16:00 - 17:40 at Aurora A - Fairness and Green Chair(s): Aldeida Aleti

Info for room Aurora A:

Aurora A is the first room in the Aurora wing.

When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.