The Devil is in the Tails: How Long-Tailed Code Distributions Impact Large Language Models (ASE 2023 - Research Papers)

Who

Xin Zhou, Kisub Kim, Bowen Xu, Jiakun Liu, DongGyun Han, David Lo

Track

ASE 2023 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 12 Sep 2023 11:18 - 11:30 at Room C - Testing AI Systems 1 Chair(s): Leonardo Mariani

Abstract

Learning-based techniques, especially advanced Large Language Models (LLMs) for code, have gained considerable popularity in various software engineering (SE) tasks. However, most existing works focus on designing better learning-based models and pay less attention to the properties of datasets. Learning-based models, including popular LLMs for code, heavily rely on data, and the data’s properties (e.g., data distribution) could significantly affect their behavior. We conducted an exploratory study on the distribution of SE data and found that such data usually follows a skewed distribution (i.e., long-tailed distribution) where a small number of classes have an extensive collection of samples, while a large number of classes have very few samples. We investigate three distinct SE tasks and analyze the impacts of long-tailed distribution on the performance of LLMs for code. Our experimental results reveal that the long-tailed distribution has a substantial impact on the effectiveness of LLMs for code. Specifically, LLMs for code perform between 30.0% and 254.0% worse on data samples associated with infrequent labels compared to data samples of frequent labels. Our study provides a better understanding of the effects of long-tailed distributions on popular LLMs for code and insights for the future development of SE automation.

Link to Preprint

https://arxiv.org/pdf/2309.03567.pdf

Xin Zhou

Singapore Management University, Singapore

Singapore

Kisub Kim

Singapore Management University, Singapore

Singapore

Bowen Xu

North Carolina State University

United States

Jiakun Liu

Singapore Management University

DongGyun Han

Royal Holloway, University of London

United Kingdom

David Lo

Singapore Management University

Singapore

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 12 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:00	Testing AI Systems 1NIER Track / Research Papers at Room C Chair(s): Leonardo Mariani University of Milano-Bicocca

10:30 12m Talk		Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting Research Papers Li Tsz On The Hong Kong University of Science and Technology, Wenxi Zong Northeastern University, Yibo Wang Northeastern University, Haoye Tian University of Luxembourg, Ying Wang Northeastern University, Shing-Chi Cheung Hong Kong University of Science and Technology, Jeffrey Kramer Imperial College London Pre-print
10:42 12m Talk		SOCRATEST- Towards Autonomous Testing Agents via Conversational Large Language Models NIER Track Robert Feldt Chalmers University of Technology, Sweden, Sungmin Kang KAIST, Juyeon Yoon Korea Advanced Institute of Science and Technology, Shin Yoo KAIST Pre-print File Attached
10:54 12m Research paper		Semantic Data Augmentation for Deep Learning Testing using Generative AI NIER Track sondess missaoui University of York, Simos Gerasimou University of York, Nicholas Matragkas Université Paris-Saclay, CEA, List. File Attached
11:06 12m Talk		Robin: A Novel Method to Produce Robust Interpreters for Deep Learning-Based Code Classifiers Research Papers Zhen Li Huazhong University of Science and Technology, Ruqian Zhang Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Ning Wang Huazhong University of Science and Technology, Yating Li Huazhong University of Science and Technology, Shouhuai Xu University of Colorado Colorado Springs, Chen Chen University of Central Florida, Hai Jin Huazhong University of Science and Technology, Yating Li Huazhong University of Science and Technology Pre-print
11:18 12m Talk		The Devil is in the Tails: How Long-Tailed Code Distributions Impact Large Language Models Research Papers Xin Zhou Singapore Management University, Singapore, Kisub Kim Singapore Management University, Singapore, Bowen Xu North Carolina State University, Jiakun Liu Singapore Management University, DongGyun Han Royal Holloway, University of London, David Lo Singapore Management University Pre-print
11:30 12m Talk		CertPri: Certifiable Prioritization for Deep Neural Networks via Movement Cost in Feature SpaceRecorded talk Research Papers haibin zheng Zhejiang University of Technology, Jinyin Chen College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, Haibo Jin Zhejiang University of Techonology Pre-print Media Attached