Benchmarking Emerging Deep Learning Quantization Methods for Energy Efficiency (ICSA 2024 - Workshops)

Tue 4 - Sat 8 June 2024 Hyderabad, Telangana, India

Who

Saurabhsingh Rajput, Tushar Sharma

Track

ICSA 2024 Workshops

Abstract

In the era of generative artificial intelligence (AI), the quest for energy-efficient AI models is increasing. The increasing size of recent AI models has led to quantization techniques that reduce large models’ computing and memory requirements. This study aims to compare the energy consumption of five quantization methods, viz. Gradient-based Post-Training Quantization (GPTQ), Activation-aware Weight Quantization (AWQ), GPT-Generated Model Language (GGML), GPT-Generated Unified Format (GGUF), and Bits and Bytes (BNB). We benchmark and analyze the energy efficiency of these commonly used quantization methods during inference. This preliminary exploration found that GGML and its successor GGUF were the most energy-efficient quantization methods. Our findings reveal significant variability in energy profiles across methods, challenging the notion that lower precision universally improves efficiency. The results underscore the need to benchmark quantization techniques from an energy perspective beyond just model compression. Our findings could guide the selection of models using quantization techniques and the development of new quantization techniques that prioritize energy efficiency, potentially leading to more environmentally friendly AI deployments.

Benchmarking Emerging Deep Learning Quantization Methods for Energy Efficiency

GREENS 2024

Saurabhsingh RajputAuthor

Dalhousie University

Tushar SharmaAuthor

Dalhousie University

Canada

Tracks