Quantization Is Not a Dealbreaker: Empirical Insights from Large Code Models (ICSME 2025 - Research Papers Track) - ICSME 2025 - International Conference on Software Maintenance and Evolution

Who

Saima Afrin, Antonio Mastropaolo, Bowen Xu

Track

ICSME 2025 Research Papers Track

Time Zone

The program is currently displayed in (GMT+12:00) Auckland, Wellington.

Use conference time zone: (GMT+12:00) Auckland, WellingtonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 10 Sep 2025 10:45 - 11:00 at Case Room 2 260-057 - Session 2 - Quality Assurance 1 Chair(s): Coen De Roover

Abstract

Large Language Models (LLMs) have showcased exceptional capabilities across a wide range of domains, including Software Engineering (SE). Within this field, Large Code Models (LCMs)—a specialized subset of LLMs tailored to assist with coding tasks—have made significant strides in automating SE-related practices such as bug-fixing, code generation, and code summarization, elevating their effectiveness to unprecedented levels. These models, often feature billions of parameters, deliver outstanding performance but at the expense of substantial memory and computational requirements. The growing scale of LLMs not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that can reduce the resource demands of LLMs and particularly LCMs by decreasing parameter precision without substantially affecting performance (eg, 16 bit -> 4 bit). While recent studies confirm quantization guarantees code correctness, they provide limited insights into practical considerations, particularly regarding the impact on software quality attributes such as reliability, maintainability, security, and static properties (eg, cyclomatic complexity). Building upon this line of research, our study investigates the impact of quantization on the qualitative aspects of the automatically generated code. To this extent, we apply Activation-aware Weight Quantization (AWQ) to two popular code models–CodeLlama and DeepSeekCoder–to generate Java and Python code. Using advanced static analysis tools, we measure software quality metrics and static features, including cyclomatic complexity, cognitive complexity, and the LoC(Line of Code). Our findings reveal mixed outcomes: quantized models generally produce code that is more complex, longer, and less reliable, yet more maintainable than their full-precision counterparts, with notable variations across different model sizes. These results emphasize that quantization is not a ‘one-size fits all’ technique, highlighting the necessity of taking model-specific factors into account in real-world applications.

Link to Preprint

https://arxiv.org/pdf/2507.09665

Saima Afrin

William & Mary

United States

Antonio Mastropaolo

William and Mary, USA

Bowen Xu

North Carolina State University