Are Large Language Models a Threat to Programming Platforms? An Exploratory Study (ESEIW 2024 - ESEM Technical Papers Track)

Who

Md Mustakim Billah, Palash Ranjan Roy, Zadia Codabux, Banani Roy

Track

ESEIW 2024 ESEM Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+02:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 25 Oct 2024 11:40 - 12:00 at Telensenyament (B3 Building - 1st Floor) - Large language models in software engineering I Chair(s): Phuong T. Nguyen

Abstract

Background: Competitive programming platforms such as LeetCode, Codeforces, and HackerRank provide challenges to evaluate programming skills. Technical recruiters frequently utilize these platforms as a criterion for screening resumes. With the recent advent of advanced Large Language Models (LLMs) like ChatGPT, Gemini, and Meta AI, there is a need to assess their performance. Aims: This study aims to assess LLMs’ capability to solve diverse programming challenges across programming platforms and difficulty levels, providing insights into their performance in real-time and offline scenarios, comparing them to human programmers, and identifying potential threats to established norms in programming platforms. Method: The study utilized 98 problems from LeetCode and 126 from Codeforces, covering 15 categories and varying difficulty levels. Then, we participated in nine online contests from Codeforces and LeetCode. Finally, two certification tests were attempted on HackerRank to gain insights into LLMs’ real-time performance. Prompts were used to guide LLMs in solving problems, and iterative feedback mechanisms were employed. We also tried to find any possible correlation among the LLMs in different scenarios. Results: LLMs generally achieved higher success rates on LeetCode (e.g., ChatGPT at 71.43%) but faced challenges on Codeforces. While excelling in HackerRank certifications, they struggled in virtual contests, especially on Codeforces. Despite diverse performance trends, ChatGPT consistently performed well across categories, yet all LLMs struggled with harder problems and lower acceptance rates. In LeetCode archive problems, LLMs generally outperformed users in time efficiency and memory usage but exhibited moderate performance in live contests, particularly in harder Codeforces contests compared to humans. Conclusions: While not necessarily a threat, the performance of LLMs on programming platforms is indeed a cause for concern. With the prospect of more efficient models emerging in the future, programming platforms need to address this issue promptly.

Link to Preprint

https://arxiv.org/abs/2409.05824

Md Mustakim Billah

University of Saskatchewan

Canada

Palash Ranjan Roy

University of Saskatchewan

Canada

Zadia Codabux

University of Saskatchewan

Canada

Banani Roy

University of Saskatchewan

Canada

Time Zone

The program is currently displayed in (GMT+02:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+02:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 25 Oct
Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

11:00 - 12:30	Large language models in software engineering IESEM Technical Papers / ESEM Emerging Results, Vision and Reflection Papers Track at Telensenyament (B3 Building - 1st Floor) Chair(s): Phuong T. Nguyen University of L’Aquila

11:00 20m Full-paper		Optimizing the Utilization of Large Language Models via Schedule Optimization: An Exploratory Study ESEM Technical Papers Yueyue Liu The University of Newcastle, Hongyu Zhang Chongqing University, Zhiqiang Li Shaanxi Normal University, Yuantian Miao The University of Newcastle
11:20 20m Full-paper		A Comparative Study on Large Language Models for Log Parsing ESEM Technical Papers Merve Astekin Simula Research Laboratory, Max Hort Simula Research Laboratory, Leon Moonen Simula Research Laboratory and BI Norwegian Business School
11:40 20m Full-paper		Are Large Language Models a Threat to Programming Platforms? An Exploratory Study ESEM Technical Papers Md Mustakim Billah University of Saskatchewan, Palash Ranjan Roy University of Saskatchewan, Zadia Codabux University of Saskatchewan, Banani Roy University of Saskatchewan Pre-print
12:00 15m Vision and Emerging Results		Automatic Library Migration Using Large Language Models: First Results ESEM Emerging Results, Vision and Reflection Papers Track Aylton Almeida UFMG, Laerte Xavier PUC Minas, Marco Tulio Valente Federal University of Minas Gerais, Brazil
12:15 15m Vision and Emerging Results		Evaluating Large Language Models in Exercises of UML Class Diagram Modeling ESEM Emerging Results, Vision and Reflection Papers Track Daniele De Bari Politecnico di Torino, Giacomo Garaccione Politecnico di Torino, Riccardo Coppola Politecnico di Torino, Marco Torchiano Politecnico di Torino, Luca Ardito Politecnico di Torino