Code Wars: Adversarial Self-Play for Evolving Software Validation Tools
This program is tentative and subject to change.
Self-play has reshaped the landscape of AI, producing agents like AlphaGo and AlphaZero that outmatch human mastery through recursive learning. In this work, we ask: can machines play against themselves not to win games, but to strengthen software? We explore this idea in the realm of software validation, where large language models (LLMs) trained on vast oceans of code hold within them entire worlds of logic, error, and repair. Our proof-of-concept system features three LLM-driven agents: an attacker that crafts flawed code, a defender that hunts for vulnerabilities, and a judge that weighs their wits. In this evolving dance of sabotage and defense, the agents sharpen each other, creating a self-sustaining arms race of refinement. In our preliminary evaluation of adversarial self-play for technical debt detection, we observe improvement in accuracy from 19.73% to 49.66% on a benchmark dataset. Our findings offer an early indication that ideas inspired by self play may contribute to the development of future tools for software validation and security.
This program is tentative and subject to change.
Fri 17 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 17:30 | AI for Software Engineering 27Research Track / New Ideas and Emerging Results (NIER) at Asia IV Chair(s): Giuseppe Scanniello University of Salerno | ||
16:00 15mTalk | Setup AGent (SAG): A Dual-Model LLM Agent for Autonomous End-to-End Java Project Configuration New Ideas and Emerging Results (NIER) Chenhao Wei Stevens Institute of technology, Gengwu Zhao Stevens Institute of Technology, Xinyi Li Stevens Institute of Technology, Billy Ye Stevens Institute of Technology, Lu Xiao Stevens Institute of Technology | ||
16:15 15mTalk | MAJIT: Just-in-Time Detection of Compatibility Issues in Android and iOS Apps through Large Language Model-based Multi-Agent Collaboration New Ideas and Emerging Results (NIER) Jiaqi Wang Xidian University, Di Cui Xidian University, Shenghan Liu Douyin, Qiankang Mao Douyin, xiangxingqian Douyin, Qiaoyin Gan Douyin, Rui Li | ||
16:30 15mTalk | Code Wars: Adversarial Self-Play for Evolving Software Validation Tools New Ideas and Emerging Results (NIER) | ||
16:45 15mTalk | SEAlign: Alignment Training for Software Engineering AgentDistinguished Paper Award Research Track Kechi Zhang Peking University, China, Huangzhao Zhang Verdent AI, Ge Li Peking University, Jinliang You Peking University, Jia Li , Yunfei Zhao Peking University, Zhi Jin Peking University, Wuhan University | ||
17:00 15mTalk | Atomizer: An LLM-based Collaborative Multi-Agent Framework for Intent-Driven Commit Untangling Research Track Kangchen Zhu National university of Defense Technology, Zhiliang Tian National University of Defense Technology, Shangwen Wang National University of Defense Technology, mingyue leng National University of Defense Technology, Xiaoguang Mao National University of Defense Technology | ||
17:15 15mTalk | Enhancing Issue Localization Agent with Tool-Interactive Training Research Track Zexiong Ma Peking University, Chao Peng ByteDance, Qunhong Zeng Beijing Institute of Technology, Pengfei Gao ByteDance, Yanzhen Zou Peking University, Bing Xie Peking University Pre-print | ||