SemGuard: Real-Time Semantic Evaluator for Correcting LLM-Generated Code (ASE 2025 - Research Papers)

Who

Qinglin Wang, Zhihong Sun, Ruyun Wang, Tao Huang, Zhi Jin, Ge Li, Chen Lyu

Track

ASE 2025 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Nov 2025 11:20 - 11:30 at Grand Hall 1 - Program Repair 2

Abstract

\textit{Large Language Models} (LLMs) can translate natural language requirements into code, yet empirical analyses of representative models reveal that \emph{semantic errors}—programs that compile but behave incorrectly—constitute the majority of observed faults (e.g., $>$60% on DeepSeek-Coder-6.7B and QwenCoder-7B). Post-hoc repair pipelines detect such faults only \emph{after} execution, incurring latency, relying on incomplete test suites, and often mis-localizing the defect. Since semantic drift originates in the autoregressive decoding process, \emph{intervening while the code is being generated} is a direct way to stop error propagation. Constrained-decoding approaches such as ROCODE attempt this, but still wait until the entire program runs to obtain feedback and use entropy heuristics that do not truly capture semantics. A more effective solution must inject \emph{semantic} signals—early and precisely—into the decoding process.We present \textbf{SemGuard}, a semantic-evaluator-driven framework that performs real-time, line-level semantic supervision. To train the evaluator, we build \textit{SemDiff}, the first dataset with fine-grained annotations that mark the exact line where a correct and an incorrect implementation diverge. The evaluator, once embedded in the LLM’s decoder, flags deviations on partial code, rolls back to the faulty line, and guides regeneration—without executing the program or requiring test cases. Across four benchmarks, SemGuard consistently outperforms state-of-the-art baselines. It lowers the semantic error rate by \textbf{19.86%} on \textit{SemDiff} relative to ROCODE, and lifts Pass@1 by \textbf{48.92%} on the real-world \textit{LiveCodeBench} with CodeLlama-7B. Similar gains hold for StarCoder2-7B on \textit{MBPP} and for DeepSeekCoder-6.7B on the Java benchmark \textit{SemDiff-Java}, demonstrating model- and language-agnostic effectiveness.

Qinglin Wang

Shandong Normal University

Zhihong Sun

Shandong Normal University

Ruyun Wang

Institute of Information Engineering, Chinese Academy of Sciences

Tao Huang

Shandong Normal University

China

Zhi Jin

Peking University

China

Ge Li

Peking University

China

Chen Lyu

Shandong Normal University

China

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Nov
Displayed time zone: Seoul change

11:00 - 12:30	Program Repair 2Research Papers at Grand Hall 1

11:00 10m Talk		Automated Repair of Ambiguous Problem Descriptions for LLM-Based Code Generation Research Papers Haoxiang Jia Peking University, Robbie Morris University College London, He Ye University College London (UCL), Federica Sarro University College London, Sergey Mechtaev Peking University
11:10 10m Talk		Fixing Broken Graphs: LLM-Powered Automatic Code Optimization for DNN Programs Research Papers Haotian Wang Nankai University, Yicheng Sui Nankai University, Yudong Xie Nankai University, Yicong Liu Nankai University, Yufei Sun Nankai University, Changqing Shi Nankai University, Yuzhi Zhang Nankai University
11:20 10m Talk		SemGuard: Real-Time Semantic Evaluator for Correcting LLM-Generated Code Research Papers Qinglin Wang Shandong Normal University, Zhihong Sun Shandong Normal University, Ruyun Wang Institute of Information Engineering, Chinese Academy of Sciences, Tao Huang Shandong Normal University, Zhi Jin Peking University, Ge Li Peking University, Chen Lyu Shandong Normal University
11:30 10m Talk		Amur: Fixing Multi-Resource Leaks Guided by Resource Flow Analysis Research Papers Jinyoung Kim Sungkyunkwan University, Eunseok Lee Sungkyunkwan University
11:40 10m Talk		Automated Repair of OpenID Connect Programs Research Papers Tamjid Al Rahat University of Virginia, Yanju Chen University of California, San Diego, Yu Feng University of California at Santa Barbara, Yuan Tian
11:50 10m Talk		FlakyGuard: Automatically Fixing Flaky Tests at Industry Scale Research Papers Chengpeng Li University of Texas at Austin, Farnaz Behrang Uber Technologies, August Shi The University of Texas at Austin, Peng Liu Uber Technologies
12:00 10m Talk		LLMPort: Cross-file Patch Porting via Task Decomposition and Self-correction Research Papers Bofei Chen Fudan University, Lei Zhang Fudan University, Peng Deng Fudan University, Nan Wang Fudan University, Haoyu Xu Fudan University, Mingda Guo Fudan Universityv, Yuan Zhang Fudan University, Min Yang Fudan University
12:10 10m Talk		Repairing Leaks in Resource Wrappers Research Papers Sanjay Malakar University of California, Riverside, Martin Kellogg New Jersey Institute of Technology, Michael D. Ernst University of Washington, Manu Sridharan University of California at Riverside
12:20 10m Talk		Automatic Fixing of Missing Dependency Errors Research Papers Jun Lyu Nanjing University, He Zhang Nanjing University, Lanxin Yang Nanjing University, Yue Li Nanjing University, Chenxing Zhong Nanjing University, Manuel Rigger National University of Singapore