LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving (LLM4Code 2025)

Who

Md Sifat Hossain, Anika Tabassum, Md. Fahim Arefin, Tarannum Shaila Zaman

Track

LLM4Code 2025 Large Language Models for Code

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 3 May 2025 14:20 - 14:30 at 214 - Paper Session 3 Chair(s): Chao Peng

Abstract

The rapid advancement of large language models (LLMs) has opened new avenues for automating complex problem-solving tasks such as algorithmic coding and competitive programming. This paper introduces a novel evaluation technique, LLM-ProS, to assess the performance of state-of-the-art LLMs on International Collegiate Programming Contest (ICPC) -style problems. Using a curated dataset of 83 World Finals problems from 2011 to 2016 and 2024, we benchmark the models’ reasoning, accuracy, and efficiency. We evaluate four models—GPT-4o, Mistral Large, Llama-3.1-405B, and the o1 family (o1-mini and o1-preview)—across critical metrics like correctness, resource utilization, and response calibration. Our results reveal significant differences in the models’ abilities to generalize, adapt, and solve novel problems. Additionally, we investigate the impact of training methodologies, dataset contamination, and chain-of-thought reasoning on model performance. The findings provide new insights into optimizing LLMs for algorithmic tasks, highlighting both the strengths and limitations of current models.

Md Sifat Hossain

University of Dhaka

Anika Tabassum

University of Dhaka

Md. Fahim Arefin

University of Dhaka

Tarannum Shaila Zaman

University of Maryland Baltimore County

United States

Video

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 3 May
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Paper Session 3LLM4Code at 214 Chair(s): Chao Peng ByteDance

14:00 10m Talk		Mix-of-Language-Experts Architecture for Multilingual Programming LLM4Code Yifan Zong University of Waterloo, Yuntian Deng University of Waterloo, Pengyu Nie University of Waterloo
14:10 10m Talk		Proving the Coding Interview: A Benchmark for Formally Verified Code Generation LLM4Code Quinn Dougherty Unaffiliated, Ronak Mehta Unaffiliated
14:20 10m Talk		LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving LLM4Code Md Sifat Hossain University of Dhaka, Anika Tabassum University of Dhaka, Md. Fahim Arefin University of Dhaka, Tarannum Shaila Zaman University of Maryland Baltimore County Media Attached
14:30 10m Talk		Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis LLM4Code Manish Shetty University of California, Berkeley, Naman Jain University of California, Berkeley, Adwait Godbole University of California, Berkeley, Sanjit A. Seshia University of California, Berkeley, Koushik Sen University of California at Berkeley
14:40 10m Talk		Evaluating Language Models for Computer Graphics Code Completion LLM4Code Jan Kels Heinrich-Heine-Universität Düsseldorf, Abdelhalim Dahou GESIS – Leibniz-Institute for the Social Sciences, Brigitte Mathiak GESIS – Leibniz-Institute for the Social Sciences Link to publication Media Attached File Attached
14:50 10m Talk		From Zero to Sixty at the Speed of RAG: Improving YAML Recipe Generation via Retrieval LLM4Code Farima Farmahinifarahani J.P. Morgan AI Research, Petr Babkin J.P. Morgan AI Research, Salwa Alamir J.P. Morgan AI Research, Xiaomo Liu J.P. Morgan AI Research
15:00 10m Talk		SC-Bench: A Large-Scale Dataset for Smart Contract Auditing LLM4Code Shihao Xia The Pennsylvania State University, Mengting He The Pennsylvania State University, Linhai Song The Pennsylvania State University, Yiying Zhang University of California San Diego
15:10 10m Talk		METAMON: Finding Inconsistencies between Program Documentation and Behavior using Metamorphic LLM Queries LLM4Code Hyunseok Lee KAIST, Gabin An KAIST, Shin Yoo KAIST Pre-print
15:20 10m Talk		CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation LLM4Code Jinjun Peng Columbia University, Leyi Cui Columbia University, Kele Huang Columbia University, Junfeng Yang Columbia University, Baishakhi Ray Columbia University