Optimizing LLMs for Code Generation: Which Hyperparameter Settings Yield the Best Results? (APSEC 2024 - Technical Track)

Who

Chetan Arora, Ahnaf Ibn Sayeed, Sherlock A. Licorish, Fanyu Wang, Christoph Treude

Track

APSEC 2024 Technical Track

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 6 Dec 2024 10:00 - 10:30 at Room 1 (Zunhui Room) - Session (16) Chair(s): Haoye Tian

Abstract

Large Language Models (LLMs), such as GPT models, are increasingly used in software engineering for various tasks, such as code generation, requirements management, and debugging. While automating these tasks has garnered significant attention, a systematic study on the impact of varying hyperparameters on code generation outcomes remains unexplored. This study aims to assess LLMs’ code generation performance by exhaustively exploring the impact of various hyperparameters. Hyperparameters for LLMs are adjustable settings that affect the model’s behaviour and performance. Specifically, we investigated how changes to the hyperparameters—temperature, top probability (top_p), frequency penalty, and presence penalty—affect code generation outcomes. We systematically adjusted all hyperparameters together, exploring every possible combination by making small increments to each hyperparameter at a time. This exhaustive approach was applied to 13 Python code generation tasks, yielding one of four outcomes for each hyperparameter combination: no output from the LLM, non-executable code, code that fails unit tests, or correct and functional code. We analysed these outcomes for a total of 14,742 generated Python code segments, focusing on correctness, to determine how the hyperparameters influence the LLM to arrive at each outcome. Using correlation coefficient and regression tree analyses, we ascertained which hyperparameters influence which aspect of the LLM. Our results indicate optimal performance with a temperature below 0.5, top probability below 0.75, frequency penalty between -1 and +1.5, and presence penalty above -1. We make our dataset and results available to facilitate replication.

Chetan Arora

Monash University

Australia

Ahnaf Ibn Sayeed

Monash University

Australia

Sherlock A. Licorish

University of Otago

New Zealand

Fanyu Wang

Monash University

Australia

Christoph Treude

Singapore Management University

Singapore

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 6 Dec
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

09:30 - 10:30	Session (16)Technical Track at Room 1 (Zunhui Room) Chair(s): Haoye Tian University of Melbourne

09:30 30m Talk		Enhancing Code Generation through Retrieval of Cross-Lingual Semantic Graphs Technical Track Zhijie Jiang National University of Defense Technology, Zejian Shi Fudan University, Xinyu Gao , Yun Xiong Fudan University
10:00 30m Talk		Optimizing LLMs for Code Generation: Which Hyperparameter Settings Yield the Best Results? Technical Track Chetan Arora Monash University, Ahnaf Ibn Sayeed Monash University, Sherlock A. Licorish University of Otago, Fanyu Wang Monash University, Christoph Treude Singapore Management University