Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Repair
Large language model (LLM)-based automated program repair (APR) techniques have shown promising results in resolving real-world github issue tasks. Existing APR systems are primarily evaluated in unimodal settings (e.g., SWE-bench), relying solely on textual issue descriptions and source code. However, these autonomous systems struggle to resolve multimodal problem scenarios (e.g., SWE-bench M) due to limitations in interpreting and leveraging visual information. In multimodal scenarios, LLMs need to rely on visual information in the graphical user interface (GUI) to understand bugs and generate fixes. To bridge this gap, we propose GUIRepair, a cross-modal reasoning approach for resolving multimodal issue scenarios by understanding and capturing visual information. Specifically, GUIRepair integrates two key components, Image2Code and Code2Image—to enhance fault comprehension and patch validation. Image2Code extracts relevant project documents based on the issue report, then applies these domain knowledge to generate the reproduced code responsible for the visual symptoms, effectively translating GUI images into executable context for better fault comprehension. Code2Image replays the visual issue scenario using the reproduced code and captures GUI renderings of the patched program to assess whether the fix visually resolves the issue, providing feedback for patch validation. We evaluate GUIRepair on SWE bench M, and the approach demonstrates significant effectiveness. When utilizing GPT-4o as the base model, GUIRepair solves 157 instances, outperforming the best open-source baseline by 26 instances. Furthermore, when using o4-mini as the base model, GUIRepair can achieve even better results and solve 175 instances, outperforming the top commercial system by 22 instances. This emphasizes the success of our new perspective on incorporating cross-modal reasoning by understanding and capturing visual information to resolve multimodal issues.
Mon 17 NovDisplayed time zone: Seoul change
11:00 - 12:30 | |||
11:00 10mTalk | Defects4C: Benchmarking Large Language Model Repair Capability with C/C++ Bugs Research Papers Jian Wang Nanyang Technological University, Xiaofei Xie Singapore Management University, Qiang Hu Tianjin University, Shangqing Liu Nanjing University, Jiongchi Yu Singapore Management University, Jiaolong Kong Singapore Management University, Yi Li Nanyang Technological University Pre-print | ||
11:10 10mTalk | MORepair: Teaching LLMs to Repair Code via Multi-Objective Fine-Tuning Journal-First Boyang Yang Yanshan University, Haoye Tian Aalto University, Jiadong Ren Yanshan University, Hongyu Zhang Chongqing University, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg, Claire Le Goues Carnegie Mellon University, Shunfu Jin Yanshan University Link to publication DOI Pre-print | ||
11:20 10mTalk | Test-based Patch Clustering for Automatically-Generated Patches Assessment Journal-First Matias Martinez Universitat Politècnica de Catalunya (UPC), Maria Kechagia National and Kapodistrian University of Athens, Anjana Perera Oracle Labs, Australia, Justyna Petke University College London, Federica Sarro University College London, Aldeida Aleti Monash University | ||
11:30 10mTalk | Hierarchical Knowledge Injection for Improving LLM-based Program Repair Research Papers Ramtin Ehsani Drexel University, Esteban Parra Rodriguez Belmont University, Sonia Haiduc Florida State University, Preetha Chatterjee Drexel University, USA | ||
11:40 10mTalk | Characterizing Multi-Hunk Patches: Divergence, Proximity, and LLM Repair Challenges Research Papers Noor Nashid University of British Columbia, Daniel Ding University of British Columbia, Keheliya Gallaba Centre for Software Excellence, Ahmed E. Hassan Queen’s University, Ali Mesbah University of British Columbia Pre-print | ||
11:50 10mTalk | Reinforcement Learning for Mutation Operator Selection in Automated Program Repair Journal-First Carol Hanna University College London, Aymeric Blot University of Rennes, IRISA / INRIA, Justyna Petke University College London | ||
12:00 10mTalk | Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Repair Research Papers Kai Huang Technical University of Munich, Jian Zhang Nanyang Technological University, Xiaofei Xie Singapore Management University, Chunyang Chen TU Munich | ||