Beyond Syntax: How Do LLMs Understand Code? (ICSE 2025 - New Ideas and Emerging Results (NIER))

Who

Marc North, Amir Atapour-Abarghouei, Nelly Bencomo

Track

ICSE 2025 New Ideas and Emerging Results (NIER)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 15:15 - 15:30 at 212 - AI for Analysis 5 Chair(s): Tien N. Nguyen

Abstract

Within software engineering research, Large Language Models (LLMs) are often treated as ‘black boxes’, with only their inputs and outputs being considered. In this paper, we take a machine interpretability approach to examine how LLMs internally represent and process code.

We focus on variable declaration and function scope, training classifier probes on the residual streams of LLMs as they process code written in different programming languages to explore how LLMs internally represent these concepts across different programming languages. We also look for specific attention heads that support these representations and examine how they behave for inputs of different languages.

Our results show that LLMs have an understanding — and internal representation — of \emph{language-independent} coding semantics that goes beyond the syntax of any specific programming language, using the same internal components to process code, regardless of the programming language that the code is written in. Furthermore, we find evidence that these language-independent semantic components exist in the middle layers of LLMs and are supported by language-specific components in the earlier layers that parse the syntax of specific languages and feed into these later semantic components.

Finally, we discuss the broader implications of our work, particularly in relation to concerns that AI, with its reliance on large datasets to learn new programming languages, might limit innovation in programming language design. By demonstrating that LLMs have a language-independent representation of code, we argue that LLMs may be able to flexibly learn the syntax of new programming languages while retaining their semantic understanding of universal coding concepts. In doing so, LLMs could promote creativity in future programming language design, providing tools that augment rather than constrain the future of software engineering.

Marc North

Durham University

Amir Atapour-Abarghouei

Durham University

Nelly Bencomo

Durham University

United Kingdom

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	AI for Analysis 5Research Track / New Ideas and Emerging Results (NIER) at 212 Chair(s): Tien N. Nguyen University of Texas at Dallas

14:00 15m Talk		3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers Research Track Sarah Fakhoury Microsoft Research, Markus Kuppe Microsoft Research, Shuvendu K. Lahiri Microsoft Research, Tahina Ramananandro Microsoft Research, Nikhil Swamy Microsoft Research Pre-print
14:15 15m Talk		Aligning the Objective of LLM-based Program Repair Research Track Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ying Fu Chongqing University, Shin Hwei Tan Concordia University, Pinjia He Chinese University of Hong Kong, Shenzhen Pre-print
14:30 15m Talk		Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models Research Track Aidan Z.H. Yang Carnegie Mellon University, Sophia Kolak Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
14:45 15m Talk		The Fact Selection Problem in LLM-Based Program Repair Research Track Nikhil Parasaram Uber Amsterdam, Huijie Yan University College London, Boyu Yang University College London, Zineb Flahy University College London, Abriele Qudsi University College London, Damian Ziaber University College London, Earl T. Barr University College London, Sergey Mechtaev Peking University
15:00 15m Talk		Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models Research Track Zhijie Wang University of Alberta, Zijie Zhou University of Illinois Urbana-Champaign, Da Song University of Alberta, Yuheng Huang University of Alberta, Canada, Shengmai Chen Purdue University, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University Pre-print
15:15 15m Talk		Beyond Syntax: How Do LLMs Understand Code? New Ideas and Emerging Results (NIER) Marc North Durham University, Amir Atapour-Abarghouei Durham University, Nelly Bencomo Durham University