LineBreaker: Finding Token-Inconsistency Bugs using Large Language Models (ASE 2025 - Research Papers)

Who

Hongbo Chen, Yifan Zhang, Xing Han, Tianhao Mao, Huanyao Rong, Yuheng Zhang, Hang Zhang, XiaoFeng Wang, Luyi Xing, Xun Chen

Track

ASE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 17 Nov 2025 12:20 - 12:30 at Grand Hall 2 - Bug Understanding 1 Chair(s): Michael Pradel

Abstract

Token-inconsistency bugs (TIBs) involve the misuse of syntactically valid yet incorrect code tokens, such as misused variables and erroneous function invocations, which can often lead to software bugs. Unlike simple syntactic bugs, TIBs occur at the semantic level and are subtle - sometimes remain undetected for years. Traditional detection methods, such as static analysis and dynamic testing, often struggle with TIBs due to their versatile and context-dependent nature. However, advancements in large language models (LLMs) like GPT-4 present new opportunities for automating TIB detection by leveraging these models’ semantic understanding capabilities.

This paper reports the first systematic measurement of LLMs’ capabilities in detecting TIBs, revealing that while GPT-4 shows promise, it exhibits limitations in precision and scalability. Specifically, its detection capability is undermined by the model’s tendency to focus on the code snippets that do not contain TIBs; its scalability concern arises from GPT-4’s high cost and the massive amount of code requiring inspection. To address these challenges, we introduce LineBreaker, a novel and cascaded TIB detection system. LineBreaker leverages smaller, code-specific, and highly efficient language models to filter out large numbers of code snippets unlikely to contain TIBs, thereby significantly enhancing the system’s performance in terms of precision, recall, and scalability. We evaluated LineBreaker on 154 Python and C GitHub repositories, each with over 1,000 stars, uncovering 123 new flaws, 45% of which could be exploited to disrupt program functionalities. Out of our 69 submitted fixes, 41 have already been confirmed or merged

Hongbo Chen

Indiana University Bloomington

United States

Yifan Zhang

San Diego State University

United States

Xing Han

The Hong Kong University of Science and Technology

Tianhao Mao

Indiana University

Huanyao Rong

Indiana University Bloomington

Yuheng Zhang

Tsinghua University

Hang Zhang

Indiana University

XiaoFeng Wang

ACM member

Luyi Xing

Indiana University Bloomington/University of Illinois Urbana-Champaign

Xun Chen

Samsung Research America