ICSE 2024
Fri 12 - Sun 21 April 2024 Lisbon, Portugal
Fri 19 Apr 2024 14:00 - 14:15 at Almada Negreiros - Language Models and Generated Code 3 Chair(s): Jie M. Zhang

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the effectiveness of these models, multiple existing benchmarks (e.g., HumanEval and AiXBench) are proposed, including only cases of generating a standalone function, i.e., a function that may invoke or access only built-in functions and standard libraries. However, standalone functions constitute only about 30% of the functions in popular open-source projects, and evaluating models’ effectiveness on standalone functions cannot reflect their effectiveness on pragmatic code generation scenarios (i.e., code generation for real settings of open source or proprietary code).

To help bridge the preceding gap, in this paper, we propose a benchmark named CoderEval, consisting of 230 Python and 230 Java code generation tasks carefully curated from popular real-world open-source projects and a self-contained execution platform to automatically assess the functional correctness of generated code. CoderEval supports code generation tasks from six levels of context dependency, where context refers to code elements such as types, APIs, variables, and consts defined outside the function under generation but within the dependent third-party libraries, current class, file, or project. CoderEval can be used to evaluate the effectiveness of models in generating code beyond only standalone functions. By evaluating three publicly available state-of-the-art code generation models (CodeGen, PanGu-Coder, and ChatGPT) on CoderEval and HumanEval, we find that the effectiveness of these models in generating standalone functions is substantially higher than that of non-standalone functions. Our analysis highlights the current progress and pinpoints future directions to further improve a model’s effectiveness by leveraging contextual information for pragmatic code generation.

Fri 19 Apr

Displayed time zone: Lisbon change

14:00 - 15:30
Language Models and Generated Code 3Research Track / Demonstrations at Almada Negreiros
Chair(s): Jie M. Zhang King's College London
14:00
15m
Talk
CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
Research Track
Hao Yu Peking University, Bo Shen Huawei Cloud Computing Technologies Co., Ltd., Dezhi Ran Peking University, Jiaxin Zhang Huawei Cloud Computing Technologies Co., Ltd., Qi Zhang Huawei Cloud Computing Technologies Co., Ltd., Yuchi Ma Huawei Cloud Computing Technologies CO., LTD., Guangtai Liang Huawei Cloud Computing Technologies, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Qianxiang Wang Huawei Technologies Co., Ltd, Tao Xie Peking University
14:15
15m
Talk
Inferring Data Preconditions from Deep Learning Models for Trustworthy Prediction in Deployment
Research Track
Shibbir Ahmed Iowa State University, Hongyang Gao Dept. of Computer Science, Iowa State University, Hridesh Rajan Iowa State University
14:30
15m
Talk
GrammarT5: Grammar-Integrated Pretrained Encoder-Decoder Neural Model for Code
Research Track
Qihao Zhu Peking University, Qingyuan Liang Peking University, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Yingfei Xiong Peking University, Lu Zhang Peking University, Shengyu Cheng ZTE Corporation
14:45
15m
Talk
On Calibration of Pre-trained Code models
Research Track
Zhenhao Zhou Fudan University, Chaofeng Sha Fudan University, Xin Peng Fudan University
DOI Media Attached
15:00
15m
Talk
Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models
Research Track
Shuzheng Gao , Wenxin Mao Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Li Li Beihang University, Xing Hu Zhejiang University, Xin Xia Huawei Technologies, Michael Lyu The Chinese University of Hong Kong
15:15
7m
Talk
GitHubInclusifier: Finding and fixing non-inclusive language in GitHub Repositories
Demonstrations
Liam Todd Monash University, John Grundy Monash University, Christoph Treude Singapore Management University
Pre-print Media Attached