Delta Debugging for LLM-integrated Systems
This program is tentative and subject to change.
Large Language Models (LLMs) are increasingly integrated into software systems as automated decision-making components. These systems rely on instruction prompts written in natural language to encode complex workflows. However, debugging these prompts when LLMs produce undesired outputs remains challenging due to their black-box nature and the impracticality of manually inspecting large, complex inputs. Unlike traditional software, LLMs provide no access to execution paths or intermediate states, making it difficult to identify which input fragments are responsible for unexpected behavior.
This paper investigates whether delta debugging can be effectively applied to identify and isolate problematic parts of LLM inputs that lead to undesired outputs. We introduce semantic markers as an instrumentation technique that embeds unique identifiers in LLM inputs and extracts traceability information from chain-of-thought reasoning. We systematically evaluate whether these markers accurately identify causal input fragments and enable delta debugging to isolate minimal subsets responsible for incorrect outputs.
Through experiments on a benchmark representing development scenarios and case studies from production systems, we demonstrate that delta debugging with semantic markers can systematically pinpoint problematic input fragments in both development and production settings. Our investigation shows that this approach transforms prompt debugging from an ad-hoc manual process into a systematic methodology, enabling engineers to efficiently identify and address the root causes of unexpected LLM behavior in real-world applications.
This program is tentative and subject to change.
Wed 15 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Testing and Analysis 1SE In Practice (SEIP) / Research Track at Oceania IX Chair(s): Michael Pradel CISPA Helmholtz Center for Information Security | ||
11:00 15mTalk | BFix: Automated Safe Memory-Leak Fixing for Binary Code Research Track Wen Zhang University of Georgia, Botang Xiao University of Georgia, Qingchen Kong University of Georgia, Boyang Yi University of Georgia, Suxin Ji University of Georgia, USA, Yage Hu University of Georgia, Songlan Wang University of Georgia, Wenwen Wang University of Georgia | ||
11:15 15mTalk | Learning without Forgetting: Towards Continual learning of Fault Localization Models in Industrial Software Systems Research Track Chun Li Nanjing University, Hui Li Samsung Electronics (China) R&D Centre, Zhong Li Nanjing University, Minxue Pan Nanjing University, Xuandong Li Nanjing University Media Attached File Attached | ||
11:30 15mTalk | Memory-Efficient Large Language Models for Program Repair with Semantic-Guided Patch Generation Research Track Thanh Le-Cong Singapore University of Technology and Design, Singapore, Xuan-Bach D. Le University of Melbourne, Toby Murray University of Melbourne Media Attached | ||
11:45 15mTalk | Addressing Test Flakiness: Practical Approaches in a Database-Reliant Industrial System SE In Practice (SEIP) George Vegelien Delft University of Technology, Carolin Brandt Delft University of Technology, Bas Graaf Exact, Arie van Deursen TU Delft Pre-print | ||
12:00 15mTalk | XTrace: A Non-Invasive Dynamic Tracing Framework for Android Applications in Production SE In Practice (SEIP) Qi Hu ByteDance, Jiangchao Liu ByteDance, Lin Zhang ByteDance, Edward Jiang ByteDance, Xin Yu ByteDance | ||
12:15 15mTalk | Delta Debugging for LLM-integrated Systems SE In Practice (SEIP) Hao-Nan Zhu University of California, Davis, Muhammad Numair Mansur Amazon Web Services, Martin Schäf Amazon Web Services, Zeya Chen Amazon Web Services, Tancrède Lepoint Amazon, Willem Visser Amazon Web Services | ||