On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Content (CAIN 2025 - Research and Experience Papers)

Who

Vince Nguyen, Hieu Huynh, Vidya Dhopate, Anusha Annengala, Hiba Bouhlal, Gian Luca Scoccia, Matias Martinez, Vincenzo Stoico, Ivano Malavolta

Track

CAIN 2025 Research and Experience Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 27 Apr 2025 14:30 - 14:45 at 208 - Architecting and Testing AI Systems Chair(s): Jan-Philipp Steghöfer

Abstract

Context. While on-device LLMs offer higher privacy over their remotely-hosted counterparts and do not require Internet connectivity, their energy consumption on the client device still remains insufficiently investigated.

Goal. This study empirically evaluates the energy usage of client devices when fetching LLM-generated content on-device versus from a remote server. Our goal is to help software developers make informed decisions on the most energy-efficient method for fetching content in different scenarios, so as to optimize the client device’s energy consumption.

Method. We conduct a controlled experiment with seven LLMs with varying parameter sizes running on a MacBook Pro M2 and on a remote server. The experiment involves fetching content of different lengths from the LLMs deployed either on-device or remotely, while measuring the client device’s energy usage and performance metrics such as execution time, CPU, GPU, and memory usage.

Results. Fetching LLM-generated content from a remote server uses 4 to 9 times less energy compared to the on-device method, with a large effect size. We observe a consistent strong positive correlation between energy usage and execution time across all content lengths and fetch methods. For the on-device method, GPU and memory usage are positively correlated with energy usage.

Conclusions. We recommend offloading LLM-generated content to a remote server rather than generating it on-device to optimize energy efficiency on the client side. Developers should optimize on-device LLMs to decrease execution time, GPU usage, and memory usage.

Link to Preprint

http://www.ivanomalavolta.com/files/papers/CAIN_2025.pdf

Vince Nguyen

Vrije Universiteit Amsterdam

Hieu Huynh

Vrije Universiteit Amsterdam

Vidya Dhopate

Vrije Universiteit Amsterdam

Anusha Annengala

Vrije Universiteit Amsterdam

Hiba Bouhlal

Vrije Universiteit Amsterdam

Gian Luca Scoccia

Gran Sasso Science Institute

Matias Martinez

Universitat Politècnica de Catalunya (UPC)

Spain

Vincenzo Stoico

Vrije Universiteit Amsterdam

Netherlands

Ivano Malavolta

Vrije Universiteit Amsterdam

Netherlands

Slide deck

Replication package

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 27 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Architecting and Testing AI SystemsResearch and Experience Papers at 208 Chair(s): Jan-Philipp Steghöfer XITASO GmbH IT & Software Solutions

14:00 15m Talk		How Do Model Export Formats Impact the Development of ML-Enabled Systems? A Case Study on Model IntegrationDistinguished paper Award Candidate Research and Experience Papers Shreyas Kumar Parida ETH Zurich, Ilias Gerostathopoulos Vrije Universiteit Amsterdam, Justus Bogner Vrije Universiteit Amsterdam Pre-print
14:15 15m Talk		RAGProbe: Breaking RAG Pipelines with Evaluation ScenariosDistinguished paper Award Candidate Research and Experience Papers Shangeetha Sivasothy Applied Artificial Intelligence Institute, Deakin University, Scott Barnett Deakin University, Australia, Stefanus Kurniawan Deakin University, Zafaryab Rasool Applied Artificial Intelligence Institute, Deakin University, Rajesh Vasa Deakin University, Australia
14:30 15m Talk		On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Content Research and Experience Papers Vince Nguyen Vrije Universiteit Amsterdam, Hieu Huynh Vrije Universiteit Amsterdam, Vidya Dhopate Vrije Universiteit Amsterdam, Anusha Annengala Vrije Universiteit Amsterdam, Hiba Bouhlal Vrije Universiteit Amsterdam, Gian Luca Scoccia Gran Sasso Science Institute, Matias Martinez Universitat Politècnica de Catalunya (UPC), Vincenzo Stoico Vrije Universiteit Amsterdam, Ivano Malavolta Vrije Universiteit Amsterdam Pre-print Media Attached
14:45 10m Talk		LoCoML: A Framework for Real-World ML Inference Pipelines Research and Experience Papers Kritin Maddireddy IIIT Hyderabad, Santhosh Kotekal Methukula IIIT Hyderabad, Chandrasekar S IIIT Hyderabad, Karthik Vaidhyanathan IIIT Hyderabad
14:55 10m Talk		Towards Continuous Experiment-driven MLOps Research and Experience Papers Keerthiga Rajenthiram Vrije Universiteit Amsterdam, Milad Abdullah Charles University, Ilias Gerostathopoulos Vrije Universiteit Amsterdam, Petr Hnětynka Charles University, Tomas Bures Charles University, Czech Republic, Gerard Pons Universitat Politècnica de Catalunya, Barcelona, Spain, Besim Bilalli Universitat Politècnica de Catalunya, Barcelona, Spain, Anna Queralt Universitat Politècnica de Catalunya, Barcelona, Spain
15:05 25m Other		Discussion Research and Experience Papers