Pre-trained code models have achieved notable success in the field of Software Engineering (SE). However, existing studies have predominantly focused on improving model performance, with limited attention given to other critical aspects such as model calibration. Model calibration, which refers to the accurate estimation of predictive uncertainty, is a vital consideration in practical applications. Therefore, in order to advance the understanding of model calibration in SE, we conduct a comprehensive investigation into the calibration of pre-trained code models in this paper. Our investigation focuses on five pre-trained code models and four code understanding tasks, including analyses of calibration in both in-distribution and out-of-distribution settings. Several key insights are uncovered: (1) pre-trained code models may suffer from the issue of over-confidence; (2) temperature scaling and label smoothing are effective in calibrating code models in in-distribution data; (3) the issue of over-confidence in pre-trained code models worsens in different out-of-distribution settings, and the effectiveness of temperature scaling and label smoothing diminishes. All materials used in our experiments are available at https://anonymous.4open.science/r/Calibration-of-Pretrained-Code-Models-C80C.
Fri 19 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Language Models and Generated Code 3Research Track / Demonstrations at Almada Negreiros Chair(s): Jie M. Zhang King's College London | ||
14:00 15mTalk | CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models Research Track Hao Yu Peking University, Bo Shen Huawei Cloud Computing Technologies Co., Ltd., Dezhi Ran Peking University, Jiaxin Zhang Huawei Cloud Computing Technologies Co., Ltd., Qi Zhang Huawei Cloud Computing Technologies Co., Ltd., Yuchi Ma Huawei Cloud Computing Technologies CO., LTD., Guangtai Liang Huawei Cloud Computing Technologies, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Qianxiang Wang Huawei Technologies Co., Ltd, Tao Xie Peking University | ||
14:15 15mTalk | Inferring Data Preconditions from Deep Learning Models for Trustworthy Prediction in Deployment Research Track Shibbir Ahmed Iowa State University, Hongyang Gao Dept. of Computer Science, Iowa State University, Hridesh Rajan Iowa State University | ||
14:30 15mTalk | GrammarT5: Grammar-Integrated Pretrained Encoder-Decoder Neural Model for Code Research Track Qihao Zhu Peking University, Qingyuan Liang Peking University, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Yingfei Xiong Peking University, Lu Zhang Peking University, Shengyu Cheng ZTE Corporation | ||
14:45 15mTalk | On Calibration of Pre-trained Code models Research Track DOI Media Attached | ||
15:00 15mTalk | Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models Research Track Shuzheng Gao , Wenxin Mao Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Li Li Beihang University, Xing Hu Zhejiang University, Xin Xia Huawei Technologies, Michael Lyu The Chinese University of Hong Kong | ||
15:15 7mTalk | GitHubInclusifier: Finding and fixing non-inclusive language in GitHub Repositories Demonstrations Liam Todd Monash University, John Grundy Monash University, Christoph Treude Singapore Management University Pre-print Media Attached |