Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents (ICSE 2026 - Research Track)

Who

Ratnadira Widyasari, Martin Weyssow, Ivana Clairine Irsan, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, Hong Jin Kang, David Lo

Track

ICSE 2026 Research Track

Abstract

Detecting vulnerabilities in source code remains a critical yet challenging task, especially when benign and vulnerable functions share significant similarities. In this work, we introduce VulTrial, a courtroom-inspired multi-agent framework designed to enhance automated vulnerability detection. It employs four role-specific agents, which are a security researcher, a code author, a moderator, and a review board. Through extensive experiments using GPT-3.5 and GPT-4o, we demonstrate that VulTrial outperforms single-agent and multi-agent baselines. Using GPT-4o, VulTrial improves the number of pairs that are correctly labeled by 41 and 37 (around double) over its respective baseline. Additionally, we show that role-specific instruction tuning in VulTrial with small quantities of data (50 pair samples) improves the number of pairs that are correctly labeled by 56 and 52 points. Furthermore, we analyze the impact of increasing the number of agent interactions on VulTrial’s overall performance. While multi-agent setups inherently incur higher costs due to increased token usage, our findings reveal that applying VulTrial to a cost-effective model like GPT-3.5 can improve its performance compared to GPT-4o in a single-agent setting, at a lower overall cost.

Ratnadira Widyasari

Singapore Management University, Singapore

Martin Weyssow

Singapore Management University

Ivana Clairine Irsan

Singapore Management University

Han Wei Ang

GovTech