BRACE: Unified Benchmarking of Accuracy and Energy for Code Language Models
The rapid advancement of AI technologies and their accelerated adoption in software development necessitates a systematic evaluation of their environmental impact alongside functional correctness. While prior studies have examined sustainability in large language models, existing approaches lack systematic frameworks for evaluating accuracy-energy trade-offs in Code Language Models (CLMs). In this paper, we present a framework, BRACE, to benchmark CLMs on a unified scale of energy efficiency and functional correctness (referred to as accuracy). We benchmark 22 state-of-the-art models on code generation and summarization tasks, proposing two rating methods: Concentric Incremental Rating Circles (CIRC) and Observation to Expectation Rating (OTER). CIRC provides deterministic Euclidean-based rankings with static trade-offs that are robust to outliers, and OTER offers trend-aware evaluation with dynamic trade-offs that capture the complex correlation between energy and accuracy, each offering a distinct perspective and addressing the problem in a unique way. These rating methods enable us to rate LLMs on a 1-5 scale reflecting their combined capabilities in terms of energy efficiency and functional correctness. Our analysis reveals models generally perform better in the code summarization tasks as they are not enforced to generate a grammar-based and syntactically correct output. Also, we find that models’ size does not have a significant impact on their ratings, indicating that if models utilize their parameters efficiently, they can be ranked higher on the energy-accuracy scale.
| BRACE presentation (AI4SE-3.pdf) | 1.11MiB |
Mon 13 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
09:45 - 10:30 | |||
09:45 5mTalk | Understanding and Mitigating Library-Related Issues in LLM-Generated Code Journal Ahead Workshop (JAWs) Yacine Majdoub University of Gabes, Rinad Hamid University of Calgary, Canada, Eya Ben Charrada University of Gabes, Ahmad Abdellatif University of Calgary, Haifa Touati IReSCoMath Research Lab, Faculty of Sciences, University Of Gabes, Tunisia | ||
09:50 5mTalk | Magnifying Inefficiency: How LLMs Amplify Performance Anti-Patterns in Mobile Development Journal Ahead Workshop (JAWs) | ||
09:55 5mTalk | BRACE: Unified Benchmarking of Accuracy and Energy for Code Language Models Journal Ahead Workshop (JAWs) Mohammadjavad Mehditabar Dalhousie University, Saurabhsingh Rajput Dalhousie University, Antonio Mastropaolo William and Mary, USA, Tushar Sharma Dalhousie University Pre-print File Attached | ||
10:00 5mTalk | Learning Model Mutations From Faults in Deep Learning Journal Ahead Workshop (JAWs) Zaheed Ahmed Institute of Computer Science, University of Göttingen, Lower Saxony, Germany, Philip Makedonski Institute of Computer Science, University of Göttingen, Lower Saxony, Germany, Jens Grabowski Media Attached | ||
10:05 5mTalk | Artificial or Just Artful? Do LLMs Bend the Rules in Programming? Journal Ahead Workshop (JAWs) Oussama Ben Sghaier Queen's University, Kévin Delcourt Université de Montréal, Houari Sahraoui DIRO, Université de Montréal | ||
10:10 5mTalk | Towards Automated User Story Quality Assessment with LLMs: An Empirical Study on Syntactic and Pragmatic QUS Criteria Journal Ahead Workshop (JAWs) Izabella Silva Federal University of Campina Grande - ISE/VIRTUS, Emanuel Dantas Filho Federal University of Campina Grande - ISE/VIRTUS, Ademar Sousa Neto VIRTUS/UFCG, Mirko Perkusich VIRTUS, Danyllo Albuquerque VIRTUS/UFCG, Kyller Costa Gorgônio Federal University of Campina Grande, Angelo Percusich Federal University of Campina Grande - ISE/VIRTUS | ||
10:15 5mTalk | MARS: Few-Shot Android Malware Detection with RAG-Enhanced LLMs Journal Ahead Workshop (JAWs) Guangquan Xu School of Cybersecurity, Tianjin University, Minhong Dong School of Cybersecurity, Tianjin University, Qi Guo Tianjin University, Hongpeng Bai School of Cybersecurity, Tianjin University, Yao Zhang Tianjin University, Ruitao Feng Southern Cross University, Wenying He Hebei University of Technology, Yude Bai Tianjin University, Ji Zhang University of Southern Queensland | ||
10:20 5mTalk | A Closer Look at the Malicious Pre-Trained Models on Hugging Face Journal Ahead Workshop (JAWs) Junwei Zhang Zhejiang University, Xing Hu Zhejiang University, Xin Xia Zhejiang University, David Lo Singapore Management University, Shanping Li Zhejiang University | ||