What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs? (ASE 2023 - Research Papers)

Who

Shuzheng Gao, Xin-Cheng Wen, Cuiyun Gao, Wenxuan Wang, Hongyu Zhang, Michael Lyu

Track

ASE 2023 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 13 Sep 2023 14:06 - 14:18 at Plenary Room 2 - Code Summarization Chair(s): Ray Buse

Abstract

Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. This new learning paradigm is training-free and has shown impressive performance in various natural language processing and code intelligence tasks. However, the performance of ICL heavily relies on the quality of demonstrations, e.g., the selected examples. It is important to systematically investigate how to construct a good demonstration for code-related tasks. In this paper, we empirically explore the impact of three key factors on the performance of ICL in code intelligence tasks: the selection, order, and number of demonstration examples. We conduct extensive experiments on three code intelligence tasks including code summarization, bug fixing, and program synthesis. Our experimental results demonstrate that all the above three factors dramatically impact the performance of ICL in code intelligence tasks. Additionally, we summarize our findings and provide takeaway suggestions on how to construct effective demonstrations, taking into account these three perspectives. We also show that a carefully-designed demonstration based on our findings can lead to substantial improvements over widely-used demonstration construction methods, e.g., improving BLEU-4, EM, and EM by at least 9.90%, 175.96%, and 50.81% on code summarization, bug fixing, and program synthesis, respectively.

Link to Preprint

https://arxiv.org/abs/2304.07575

File attachments

slides (ASE_ICL.pptx)	8.51MiB

Shuzheng Gao

The Chinese University of Hong Kong

Xin-Cheng Wen

Harbin Institute of Technology

Cuiyun Gao

Harbin Institute of Technology

China

Wenxuan Wang

Chinese University of Hong Kong

China

Hongyu Zhang

Chongqing University

China

Michael Lyu

The Chinese University of Hong Kong

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 13 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

13:30 - 15:00	Code SummarizationResearch Papers at Plenary Room 2 Chair(s): Ray Buse Google

13:30 12m Talk		Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models Research Papers Liran Wang Beihang University, Xunzhu Tang University of Luxembourg, Yichen He Beihang University, Changyu Ren Beihang University, Shuhua Shi Beihang University, Chaoran Yan Beihang University, Zhoujun Li Beihang University Pre-print File Attached
13:42 12m Talk		From Commit Message Generation to History-Aware Commit Message Completion Research Papers Aleksandra Eliseeva JetBrains Research, Yaroslav Sokolov JetBrains, Egor Bogomolov JetBrains Research, Yaroslav Golubev JetBrains Research, Danny Dig JetBrains Research & University of Colorado Boulder, USA, Timofey Bryksin JetBrains Research Pre-print File Attached
13:54 12m Talk		Automatic Generation and Reuse of Precise Library Summaries for Object-Sensitive Pointer Analysis Research Papers Jingbo Lu University of New South Wales, Dongjie He UNSW, Wei Li University of New South Wales, Yaoqing Gao Huawei Toronto Research Center, Jingling Xue UNSW Pre-print File Attached
14:06 12m Talk		What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs? Research Papers Shuzheng Gao The Chinese University of Hong Kong, Xin-Cheng Wen Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Wenxuan Wang Chinese University of Hong Kong, Hongyu Zhang Chongqing University, Michael Lyu The Chinese University of Hong Kong Pre-print File Attached
14:18 12m Talk		HexT5: Unified Pre-training for Stripped Binary Code Information InferenceRecorded talk Research Papers Jiaqi Xiong University of Science and Technology of China, Guoqiang Chen University of Science and Technology of China, Kejiang Chen University of Science and Technology of China, Han Gao University of Science and Technology of China, Shaoyin Cheng University of Science and Technology of China, Weiming Zhang University of Science and Technology of China Media Attached File Attached
14:30 12m Talk		Generating Variable Explanations via Zero-shot Prompt LearningRecorded talk Research Papers Chong Wang Fudan University, Yiling Lou Fudan University, Liu Junwei Fudan University, Xin Peng Fudan University Media Attached