Understanding Code Changes Practically with Small-Scale Language Models (ASE 2024 - Research Papers)

Who

Cong Li, Zhaogui Xu, Peng Di, Dongxia Wang, Zheng Li, Qian Zheng

Track

ASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 31 Oct 2024 13:45 - 14:00 at Magnoila - Code and issue report Chair(s): Baishakhi Ray

Abstract

Recent studies revealed that traditional techniques of understanding code changes are less effective than techniques directly prompting large language models (LLMs). However, current techniques utilizing LLMs heavily rely on commercial, large-scale ones such as GPT-3.5 and GPT-4, preventing their widespread practical deployment. This paper seeks to investigate the feasibility of deploying small-scale LLMs while maintaining comparable or superior performance to commercial and larger-scale LLMs in terms of code change understanding. To achieve this, we have developed a small yet high-quality dataset called HQCM which was reviewed, revised, and validated by five human experts. After finetuning small-scale (7B and 220M) LLMs via it, our evaluation has confirmed the significant profits brought by HQCM and has indicated that small-scale LLMs, after fine-tuning by HQCM, can achieve superior performance in change understanding for change summarization, change classification, and code refinement, compared to state-of-the-art baselines and larger-scale (>=70B) LLMs. This study supports the use of small-scale LLMs in industry or resource-constrained settings like embedded systems, distinguishing our work from others.

Cong Li

Zhejiang University; Ant Group

China

Zhaogui Xu

Ant Group

China

Peng Di

Ant Group

China

Dongxia Wang

Zhejiang University

China

Zheng Li

Ant Group

Qian Zheng

Ant Group

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 31 Oct
Displayed time zone: Pacific Time (US & Canada) change

13:30 - 15:00	Code and issue reportResearch Papers at Magnoila Chair(s): Baishakhi Ray Columbia University, New York; AWS AI Lab

13:30 15m Talk		PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code Research Papers Ziyou Jiang Institute of Software at Chinese Academy of Sciences, Lin Shi Beihang University, Guowei Yang University of Queensland, Qing Wang Institute of Software at Chinese Academy of Sciences DOI Pre-print
13:45 15m Talk		Understanding Code Changes Practically with Small-Scale Language Models Research Papers Cong Li Zhejiang University; Ant Group, Zhaogui Xu Ant Group, Peng Di Ant Group, Dongxia Wang Zhejiang University, Zheng Li Ant Group, Qian Zheng Ant Group
14:00 15m Talk		DRMiner: Extracting Latent Design Rationale from Jira Issue Logs Research Papers Jiuang Zhao Beihang University, Zitian Yang Beihang University, Li Zhang Beihang University, Xiaoli Lian Beihang University, China, Donghao Yang Beihang University, Xin Tan Beihang University
14:15 15m Talk		An Empirical Study on Learning-based Techniques for Explicit and Implicit Commit Messages Generation Research Papers Zhiquan Huang Sun Yat-sen University, Yuan Huang Sun Yat-sen University, Xiangping Chen Sun Yat-sen University, Xiaocong Zhou School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China, Changlin Yang Sun Yat-sen University, Zibin Zheng Sun Yat-sen University
14:30 15m Talk		RCFG2Vec: Considering Long-Distance Dependency for Binary Code Similarity Detection Research Papers Weilong Li School of Computer Science and Engineering,Sun Yat-sen University, Jintian Lu College of Computer Science and Engineering, Jishou University, Ruizhi Xiao School of Computer Science and Engineering,Sun Yat-sen University, Pengfei Shao China Southern Power Grid Digital Grid Group Information and Telecommunication Technology Co., Ltd., Shuyuan Jin School of Computer Science and Engineering,Sun Yat-sen University
14:45 15m Talk		ChatBR: Automated assessment and improvement of bug report quality using ChatGPT Research Papers Lili Bo Yangzhou University, wangjie ji Yangzhou University, Xiaobing Sun Yangzhou University, Ting Zhang Singapore Management University, Xiaoxue Wu Yangzhou University, Ying Wei Yangzhou University