Quantization Is Not a Dealbreaker: Empirical Insights from Large Code Models
Large Language Models (LLMs) have showcased exceptional capabilities across a wide range of domains, including Software Engineering (SE). Within this field, Large Code Models (LCMs)—a specialized subset of LLMs tailored to assist with coding tasks—have made significant strides in automating SE-related practices such as bug-fixing, code generation, and code summarization, elevating their effectiveness to unprecedented levels. These models, often feature billions of parameters, deliver outstanding performance but at the expense of substantial memory and computational requirements. The growing scale of LLMs not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that can reduce the resource demands of LLMs and particularly LCMs by decreasing parameter precision without substantially affecting performance (eg, 16 bit -> 4 bit). While recent studies confirm quantization guarantees code correctness, they provide limited insights into practical considerations, particularly regarding the impact on software quality attributes such as reliability, maintainability, security, and static properties (eg, cyclomatic complexity). Building upon this line of research, our study investigates the impact of quantization on the qualitative aspects of the automatically generated code. To this extent, we apply Activation-aware Weight Quantization (AWQ) to two popular code models–CodeLlama and DeepSeekCoder–to generate Java and Python code. Using advanced static analysis tools, we measure software quality metrics and static features, including cyclomatic complexity, cognitive complexity, and the LoC(Line of Code). Our findings reveal mixed outcomes: quantized models generally produce code that is more complex, longer, and less reliable, yet more maintainable than their full-precision counterparts, with notable variations across different model sizes. These results emphasize that quantization is not a ‘one-size fits all’ technique, highlighting the necessity of taking model-specific factors into account in real-world applications.