ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil
Thu 16 Apr 2026 11:30 - 11:45 at Oceania VII - Software Engineering for AI 4 Chair(s): Nathan Wintersgill

The rapid adoption of large language models (LLMs) has raised concerns about their substantial energy consumption, especially when deployed at industry scale. While several techniques have been proposed to address this, limited empirical evidence exists regarding the effectiveness of applying them to LLM-based industry applications. To fill this gap, we analyzed a chatbot application in an industrial context at Schuberg Philis, a Dutch IT services company. We then selected four techniques, namely Small and Large Model Collaboration, Prompt Optimization, Quantization, and Batching, applied them to the application in eight variations, and then conducted experiments to study their impact on energy consumption, accuracy, and response time compared to the unoptimized baseline.

Our results show that several techniques like Prompt Optimization and 2-bit Quantization managed to reduce energy use significantly, sometimes by up to 90%. However, these techniques especially impacted accuracy negatively, to a degree that is not acceptable in practice. The only technique that achieved significant and strong energy reductions without harming the other qualities substantially was Small and Large Model Collaboration via NPCC with prompt complexity thresholds. This highlights that reducing the energy consumption of LLM-based applications is not difficult in practice. However, improving their energy efficiency, i.e., reducing energy use without harming other qualities, remains challenging. Our study provides practical insights to move towards this goal.

Thu 16 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30
11:00
15m
Talk
NeMo: A Neuron-level Modularizing-While-Training Approach for Decomposing DNN Models
Journal-first Papers
Xiaohan Bi Beihang University, Binhang Qi National University of Singapore, Hailong Sun Beihang University, Xiang Gao Beihang University, Yue Yu PengCheng Lab, Xiaojun Liang PengCheng Lab
11:15
15m
Talk
A Selective Quantization Tuner for ONNX Models
New Ideas and Emerging Results (NIER)
Nikolaos Louloudakis The University of Edinburgh, Ajitha Rajan The University of Edinburgh
11:30
15m
Paper
Green LLM Techniques in Action: How Effective Are Existing Techniques for Improving the Energy Efficiency of LLM-Based Applications in Industry?
SE In Practice (SEIP)
Pelin Rabia Kuran Vrije Universiteit Amsterdam, Rumbidzai Chitakunye Vrije Universiteit Amsterdam, Vincenzo Stoico Vrije Universiteit Amsterdam, Ilja Heitlager Schuberg Philis, Justus Bogner Vrije Universiteit Amsterdam
DOI Pre-print
11:45
15m
Talk
DNN Modularization via Activation-Driven Training
Research Track
Tuan Ngo University of Southern California, Abid Hassan University of Southern California, Saad Shafiq University of Southern California, Nenad Medvidović University of Southern California
Pre-print
12:00
15m
Talk
ModularEvo: Evolving Multi-Task Models via Neural Network Modularization and Composition
Research Track
Wenrui Long Beihang university, Binhang Qi Beihang University, Hailong Sun Beihang University, ZongZhen Yang Beihang University, Ruobing Zhao Beihang University, Xiang Gao Beihang University
12:15
15m
Talk
The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM BudgetDistinguished Paper Award
Research Track
Dangfeng Pan , Zhensu Sun Singapore Management University, cenyuan zhang Monash University, David Lo Singapore Management University, Xiaoning Du Monash University