Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study (FORGE 2024 - Research Track)

Who

Tim van Dam, Frank van der Heijden, Philippe de Bekker, Berend Nieuwschepen, Marc Otten, Maliheh Izadi

Track

FORGE 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 14 Apr 2024 16:17 - 16:31 at Luis de Freitas Branco - FORGE2024 Awards & Foundation Models for Code and Documentation Generation Chair(s): Antonio Mastropaolo

Abstract

Language model-based code completion models have quickly grown in use, helping thousands of developers write code in many different programming languages. However, research on code completion models typically focuses on imperative languages such as Python and JavaScript, which results in a lack of representation for functional programming languages. Consequently, these models often perform poorly on functional languages such as Haskell. To investigate whether this can be alleviated, we evaluate the performance of two language models for code, CodeGPT and UniXcoder, on the functional programming language Haskell. We fine-tune and evaluate the models on Haskell functions sourced from a publicly accessible Haskell dataset on HuggingFace. Additionally, we manually evaluate the models using our novel translated HumanEval dataset. Our automatic evaluation shows that knowledge of imperative programming languages in the pre-training of LLMs may not transfer well to functional languages, but that code completion on functional languages is feasible. Consequently, this shows the need for more high-quality Haskell datasets. A manual evaluation on HumanEval-Haskell indicates CodeGPT frequently generates empty predictions and extra comments, while UniXcoder more often produces incomplete or incorrect predictions. Finally, we release HumanEval-Haskell, along with the fine-tuned models and all code required to reproduce our experiments on GitHub (https://github.com/haskellforge/haskellforge).

Tim van Dam

Delft University of Technology

Frank van der Heijden

Delft University of Technology

Philippe de Bekker

Delft University of Technology

Berend Nieuwschepen

Delft University of Technology

Marc Otten

Delft University of Technology

Maliheh Izadi

Delft University of Technology

Netherlands

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 14 Apr
Displayed time zone: Lisbon change

16:00 - 17:30	FORGE2024 Awards & Foundation Models for Code and Documentation GenerationResearch Track at Luis de Freitas Branco Chair(s): Antonio Mastropaolo Università della Svizzera italiana

16:00 10m Awards		Award Ceremony Research Track
16:10 7m Short-paper		Fine Tuning Large Language Model for Secure Code GenerationNew Idea Paper Research Track Junjie Li Concordia University, Aseem Sangalay Delhi Technological University, Cheng Cheng Concordia University, Yuan Tian Queen's University, Kingston, Ontario, Jinqiu Yang Concordia University
16:17 14m Full-paper		Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case StudyFull Paper Research Track Tim van Dam Delft University of Technology, Frank van der Heijden Delft University of Technology, Philippe de Bekker Delft University of Technology, Berend Nieuwschepen Delft University of Technology, Marc Otten Delft University of Technology, Maliheh Izadi Delft University of Technology
16:31 7m Short-paper		On Evaluating the Efficiency of Source Code Generated by LLMsNew Idea Paper Research Track Changan Niu Software Institute, Nanjing University, Ting Zhang Singapore Management University, Chuanyi Li Nanjing University, Bin Luo Nanjing University, Vincent Ng Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083-0688
16:38 14m Full-paper		PathOCL: Path-Based Prompt Augmentation for OCL Generation with GPT-4Full Paper Research Track Seif Abukhalaf Polytechnique Montreal, Mohammad Hamdaqa Polytechnique Montréal, Foutse Khomh École Polytechnique de Montréal
16:52 7m Short-paper		Creative and Correct: Requesting Diverse Code Solutions from AI Foundation ModelsNew Idea Paper Research Track Scott Blyth Monash University, Christoph Treude Singapore Management University, Markus Wagner Monash University, Australia
16:59 7m Short-paper		Commit Message Generation via ChatGPT: How Far Are We?New Idea Paper Research Track Yifan Wu Peking University, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Siyu Yu The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen)
17:06 24m Other		Discussion Research Track