ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

This program is tentative and subject to change.

Wed 15 Apr 2026 14:30 - 14:45 at Asia IV - AI for Software Engineering 5

While software requirements are often expressed in natural language, verifying the correctness of a program against such requirements is a hard and underexplored problem. Large language models (LLMs) are promising candidates for addressing this challenge, however our experience shows that they are ineffective in this task, often failing to detect even straightforward bugs. To address this gap, we introduce HoarePrompt, a novel approach that adapts fundamental ideas from program verification to natural language artifacts. Inspired from the strongest postcondition calculus, HoarePrompt employs a systematic, step-by-step process in which an LLM generates natural language descriptions of reachable program states at various code points. To manage loops, we propose few-shot-driven k-induction, an adaptation of the k-induction method widely used in model checking. Once program states are described, HoarePrompt leverages the LLM to assess whether the program, annotated with these state descriptions, conforms to the natural language requirements. For evaluating the quality of classifiers of program correctness with respect to natural language requirements, we constructed CoCoClaNeL, a challenging dataset of solutions to programming competition problems. Our experiments show that HoarePrompt improves the MCC by 61% compared to directly using Zero-shot-CoT prompts for correctness classification. Furthermore, HoarePrompt outperforms a classifier that assesses correctness via LLM-based test generation by an MCC increase of 106%. The inductive reasoning mechanism contributes a 26% boost to MCC, underscoring its effectiveness in managing loops.

This program is tentative and subject to change.

Wed 15 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
AI for Software Engineering 5Research Track / SE In Practice (SEIP) at Asia IV
14:00
15m
Talk
SpecGuru: Hierarchical LLM-Driven API Points-to Specification Generation with Self-Validation
Research Track
Shuangxiang Kan UNSW, Yuekang Li UNSW, Xiao Cheng Macquarie University, Yulei Sui University of New South Wales
14:15
15m
Talk
Panoptes: A Profile Clustering Framework for Context-Aware Binary Optimization
Research Track
Edwin Kayang Arizona State University, Eric Jahns Arizona State University, Mishel Jyothis Paul Arizona State University, Michel Kinsy Arizona State University
14:30
15m
Talk
HoarePrompt: Structural Reasoning About Program Correctness in Natural Language
Research Track
Dimitrios Stamatios Bouras Peking University, Yihan Dai Nankai University, Tairan Wang University College London, Yingfei Xiong Peking University, Sergey Mechtaev Peking University
14:45
15m
Talk
Large Language Model-Aided Partial Program Dependence Analysis
Research Track
Xiaokai Rong The University of Texas at Dallas, Aashish Yadavally University of Central Florida, Tien N. Nguyen University of Texas at Dallas
Pre-print
15:00
15m
Talk
Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry
SE In Practice (SEIP)
Xueying Du Fudan University, Jiayi Feng Fudan University, Yi Zou Fudan University, Wei Xu Tencent, Jie Ma Tencent, Wei Zhang Tencent, Sisi Liu Tencent, Xin Peng Fudan University, Yiling Lou University of Illinois at Urbana-Champaign
15:15
15m
Talk
CASCADE: LLM-powered JavaScript Deobfuscator at Google
SE In Practice (SEIP)
Shan Jiang UT Austin, Pranoy Kovuri Google, David Tao Google, Zhixun Tan Google