TCSE logo 
 Sigsoft logo
Sustainability badge
Sat 3 May 2025 15:20 - 15:30 at 214 - Paper Session 3 Chair(s): Chao Peng

Large Language Models (LLMs) have significantly aided developers by generating or assisting in code writing, enhancing productivity across various tasks. While identifying incorrect code is often straightforward, detecting vulnerabilities in functionally correct code is more challenging, especially for developers with limited security knowledge, which poses considerable security risks of using LLM-generated code and underscores the need for robust evaluation benchmarks that assess both functional correctness and security. Current benchmarks like CyberSecEval and SecurityEval attempt to solve it but are hindered by unclear and impractical specifications, failing to assess both functionality and security accurately. To tackle these deficiencies, we introduce CWEval, a novel outcome-driven evaluation framework designed to enhance the evaluation of secure code generation by LLMs. This framework not only assesses code functionality but also its security simultaneously with high-quality task specifications and outcome-driven test oracles which provides high accuracy. Coupled with CWEval-Bench, a multilingual, security-critical coding benchmark, CWEval provides a rigorous empirical security evaluation on LLM-generated code, overcoming previous benchmarks’ shortcomings. Through our evaluations, CWEval reveals a notable portion of functional but insecure code produced by LLMs, and shows a serious inaccuracy of previous evaluations, ultimately contributing significantly to the field of secure code generation.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
Paper Session 3LLM4Code at 214
Chair(s): Chao Peng ByteDance
14:00
10m
Talk
Mix-of-Language-Experts Architecture for Multilingual Programming
LLM4Code
Yifan Zong University of Waterloo, Yuntian Deng University of Waterloo, Pengyu Nie University of Waterloo
14:10
10m
Talk
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation
LLM4Code
Quinn Dougherty Unaffiliated, Ronak Mehta Unaffiliated
14:20
10m
Talk
LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving
LLM4Code
Md Sifat Hossain University of Dhaka, Anika Tabassum University of Dhaka, Md. Fahim Arefin University of Dhaka, Tarannum Shaila Zaman University of Maryland Baltimore County
Media Attached
14:30
10m
Talk
Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis
LLM4Code
Manish Shetty University of California, Berkeley, Naman Jain University of California, Berkeley, Adwait Godbole University of California, Berkeley, Sanjit A. Seshia University of California, Berkeley, Koushik Sen University of California at Berkeley
14:40
10m
Talk
Evaluating Language Models for Computer Graphics Code Completion
LLM4Code
Jan Kels Heinrich-Heine-Universität Düsseldorf, Abdelhalim Dahou GESIS – Leibniz-Institute for the Social Sciences, Brigitte Mathiak GESIS – Leibniz-Institute for the Social Sciences
Link to publication Media Attached File Attached
14:50
10m
Talk
From Zero to Sixty at the Speed of RAG: Improving YAML Recipe Generation via Retrieval
LLM4Code
Farima Farmahinifarahani J.P. Morgan AI Research, Petr Babkin J.P. Morgan AI Research, Salwa Alamir J.P. Morgan AI Research, Xiaomo Liu J.P. Morgan AI Research
15:00
10m
Talk
SC-Bench: A Large-Scale Dataset for Smart Contract Auditing
LLM4Code
Shihao Xia The Pennsylvania State University, Mengting He The Pennsylvania State University, Linhai Song The Pennsylvania State University, Yiying Zhang University of California San Diego
15:10
10m
Talk
METAMON: Finding Inconsistencies between Program Documentation and Behavior using Metamorphic LLM Queries
LLM4Code
Hyunseok Lee KAIST, Gabin An KAIST, Shin Yoo KAIST
Pre-print
15:20
10m
Talk
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
LLM4Code
Jinjun Peng Columbia University, Leyi Cui Columbia University, Kele Huang Columbia University, Junfeng Yang Columbia University, Baishakhi Ray Columbia University
:
:
:
: