Planning a Large Language Model for Static Detection of Runtime Errors in Code Snippets
Large Language Models (LLMs) have been excellent in generating and reasoning about source code and the textual descriptions. They can recognize patterns, syntax, and semantics in code, making them effective in several software engineering tasks. However, they exhibit weaknesses in reasoning about the program execution. They primarily operate on static code representations, failing to capture the dynamic behavior and state changes that occur during program execution. In this paper, we advance the capabilities of LLMs in reasoning about program execution. We propose ORCA, a novel approach that instructs an LLM to autonomously formulate a plan to navigate through a control flow graph (CFG) for predictive execution of (in)complete code snippets. It acts as a predictive interpreter to ``execute'' the code. As a downstream task, we use ORCA to statically identify any runtime errors for online code snippets. Early detection of runtime errors and defects in these snippets is crucial to prevent costly fixes later in the development cycle after they were adapted into a codebase. In our novel technique, we guide the LLM to pause at the branching point, focusing on the state of the symbol tables for variables’ values, thus minimizing error propagation in the LLM’s computation. We also instruct the LLM not to stop at each step in its execution plan, resulting the use of only one prompt to the LLM, thus much cost-saving. Our empirical evaluation showed that ORCA is effective and improves over the state-of-the-art approaches in predicting the execution traces and in runtime error detection.
Wed 30 AprDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 17:30 | AI for Analysis 2Research Track / Journal-first Papers at 212 Chair(s): Julia Rubin The University of British Columbia | ||
16:00 15mTalk | Neurosymbolic Modular Refinement Type Inference Research Track Georgios Sakkas UC San Diego, Pratyush Sahu UC San Diego, Kyeling Ong University of California, San Diego, Ranjit Jhala University of California at San Diego | ||
16:15 15mTalk | An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We? Research Track Hyunjae Suh University of California, Irvine, Mahan Tafreshipour University of California at Irvine, Jiawei Li University of California Irvine, Adithya Bhattiprolu University of California, Irvine, Iftekhar Ahmed University of California at Irvine | ||
16:30 15mTalk | Planning a Large Language Model for Static Detection of Runtime Errors in Code Snippets Research Track Smit Soneshbhai Patel University of Texas at Dallas, Aashish Yadavally University of Texas at Dallas, Hridya Dhulipala University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas | ||
16:45 15mTalk | LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion Research Track Chong Wang Nanyang Technological University, Kaifeng Huang Tongji University, Jian Zhang Nanyang Technological University, Yebo Feng Nanyang Technological University, Lyuye Zhang Nanyang Technological University, Yang Liu Nanyang Technological University, Xin Peng Fudan University | ||
17:00 15mTalk | Knowledge-Enhanced Program Repair for Data Science Code Research Track Shuyin Ouyang King's College London, Jie M. Zhang King's College London, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Albert Merono Penuela King's College London | ||
17:15 7mTalk | SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning Journal-first Papers Xueqi Yang North Carolina State University, Mariusz Jakubowski Microsoft, Li Kang Microsoft, Haojie Yu Microsoft, Tim Menzies North Carolina State University Link to publication DOI |