Keynote 1. Adversarial Benchmarking: An Urgent Call for a Stronger Signal of Technological Progress

Speaker: Marcel Böhme (MPI-SP, Germany)
Time and Location: 9:30 AM, May 19 (Tuesday) @ Room 201
Abstract: Benchmarking has always been our yardstick for evaluating technological progress. Today, we publish new coding agents, new testing techniques, and generally new tools for important problems enthusiastically with strong results on the most recent benchmarks. These results are offered as empirical evidence in support of our claims of superiority: “This is the new state-of-the-art”. These days, the best-performing LLMs and agentic frameworks go viral on a regular basis. However, recently there has been increasing skepticism of benchmarking as our primary methodology for evaluating technological progress: Overfitting. Results over insight. Unreliable measures of effectiveness. Expectation bias in the peer review process. An entire community misled by an invalid problem statement. Do we need better benchmarks? Or do we need a fundamentally different approach?
In this keynote, I will argue that we must—with the same enthusiasm—explicitly identify and evaluate the unique limitations of the techniques we develop. How else do we learn where to improve? In fact, this is precisely the perspective that the Software Testing research community takes on its own subject: We want to find bugs! We don’t want to confirm that a system is effectively achieving its purpose. That would be the least efficient way to establish where more progress is needed. Benchmarking is like providing a test suite with a Protocol RFC that is supposed to find bugs in every future protocol implementation. To evaluate new technology properly, we must stop seeking to confirm its effectiveness and start failing to find counterexamples. This is our call to develop the scientific standards for evaluating technological progress for all of computer science. We must set new expectations for peer review to allow, invite, and indeed request a fair analysis of the unique limitations of the techniques we propose. I will provide several concrete examples and mechanisms for how we can facilitate this “adversarial benchmarking” and call on the Software Testing research community to develop these ideas so as to ultimately facilitate a sound technological progress.
Biography: Marcel Böhme is a faculty member at the Max Planck Institute for Security and Privacy (MPI-SP). His Software Security group has made foundational contributions to automatic software testing, specifically fuzzing which has become one of the most successful techniques for automatic vulnerability discovery at scale: While conventional wisdom has that testing can only show the presence of bugs but never their absence, Marcel has developed the first statistical framework to make statements about a program’s correctness after an error-less testing campaign. While testing is embarrassingly parallel, his probabilistic theory explains how the cost of bug finding is actually exponential in the number of machines, and when even the most effective systematic testing technique is outperformed by a simple, random approach. More recently, his group has been developing the statistical and causal foundations of empirical software security analysis at scale, supported by an ERC Consolidator grant. To find out more about the research in our group, head over to https://mpi-softsec.github.io
Keynote 2: Failure-Based Testing

Speaker: Tsong Yueh Chen (Swinburne University of Technology, Australia)
Time and Location: 9:00 AM, May 20 (Wednesday) @ Room 201
Abstract: Every program has a corresponding input domain. Thus, every faulty program has an input domain with failure patterns which consist of failure causing inputs. We define failure-based testing as testing methods that make use of the information about various aspects of failure patterns, such as their shapes, sizes, orientations, sizes and numbers. With such information, we are able to design new testing methods and to better understand the inter-relations between some apparently unrelated results. The information of failure patterns not only helps to test programs but also can help to do fault localisation and program repair.
Adaptive random testing is a failure-based testing method. However, adaptive random testing only makes use of one aspect of failure patterns, namely the contiguity of failure-causing inputs. White and Cohen’s domain strategy (1980) which was proposed as a fault-based testing method, can be regarded as a failure-based testing method. Apart from sharing some interesting and important software testing results, we would like to address the potential and future of failure-based testing, in particular, in the era of AI. We believe failure-based testing has lots of research opportunities.
Biography: Tsong Yueh Chen studied at The University of Hong Kong, Imperial College London and The University of Melbourne. He is currently a Professor of Software Engineering at Swinburne University of Technology, Australia. His main research interest is on software testing. Chen is the inventor of adaptive random testing and metamorphic testing. He is the recipient of the 2024 ACM SIGSOFT Outstanding Research Award. His paper, Adaptive Random Testing: the ART of Test Case Diversity (with F.-C. Kuo, R. G. Merkel and T. H. Tse) was awarded the Grand Champion of the 2010 Most Influential Paper for Journal of Systems and Software in 2021. In 2000, Chen and Professor T. H. Tse of The University of Hong Kong co-founded the Asia-Pacific Conference on Quality Software (APAQS) which was renamed as the International Conference on Quality Software (QSIC) from 2002 to 2014.
Keynote 3: Keeping up with the Abstraction

Speaker: Shin Yoo (KAIST, Republic of Korea)
Time and Location: 9:00 AM, May 21 (Thursday) @ Room 201
Abstract: A spectre is haunting the software engineering research community - the spectre of a new abstraction layer that is LLMs and AI agents. All the powers of our community have entered into a great conversation to discuss this spectre: the machinists, who see a future in which the majority of software, if not all, is written by AI agents, and the artisans, who think the machines lack the finesse of experienced human architects. But where are the studies on how to engineer the new abstraction layer itself? Where is the concrete understanding of the new engine that drives this new form of computation? Two things result from this fact: I. LLMs and Agents are already acknowledged by both machinists and artisans to be a powerful new tool for software engineering, and, II. it is high time that we start looking into the new abstraction layer and treat the layer itself as an engineering target. To this end, the talk will discuss some of the early findings and lay out future research directions.
Biography: Shin Yoo is a professor of software engineering at the Korea Advanced Institute of Science and Technology. He received his PhD from King’s College London in 2009, supervised by Prof. Mark Harman; before his tenure at KAIST, he held a lectureship at University College London (UCL). His research primarily focuses on software testing, debugging, and the application of computational intelligence, with a recent focus on both the use and the testing of Large Language Models and AI agents. He was the program co-chair of ICST 2018, served as the chair of ICST steering committee from 2018 to 2023, and recently was the general chair of ASE 2025 held in Seoul, Korea.
Keynote
| Title | |
|---|---|
| Adversarial Benchmarking: An Urgent Call for a Stronger Signal of Technological Progress Keynote K: Marcel Böhme | |
| Failure-Based Testing Keynote | |
| Keeping up with the Abstraction Keynote K: Shin Yoo |
This program is tentative and subject to change.
Tue 19 MayDisplayed time zone: Seoul change
09:00 - 09:30 | |||
09:00 30mTalk | Opening ICST 2026 Moonzoo Kim KAIST / VPlusLab Inc. , Shin Hong Chungbuk National University, Neil Walkinshaw The University of Sheffield, Xiaoyuan Xie Wuhan University | ||
09:30 - 10:30 | |||
09:30 60mKeynote | Adversarial Benchmarking: An Urgent Call for a Stronger Signal of Technological Progress Keynote | ||
12:30 - 14:00 | |||
12:30 90mLunch | Lunch Catering | ||
18:00 - 20:00 | |||
18:00 2hSocial Event | Reception ICST 2026 | ||
Wed 20 MayDisplayed time zone: Seoul change
09:00 - 10:00 | |||
09:00 60mKeynote | Failure-Based Testing Keynote | ||
17:00 - 17:30 | |||
17:00 30mMeeting | Open Steering Committee Meeting ICST 2026 Gregory Gay Chalmers University of Technology and University of Gothenburg, Sebastiano Panichella University of Bern | ||
18:00 - 20:00 | |||
18:00 2hSocial Event | Banquet ICST 2026 | ||
Thu 21 MayDisplayed time zone: Seoul change
09:00 - 10:00 | |||
09:00 60mKeynote | Keeping up with the Abstraction Keynote | ||
10:00 - 10:30 | |||
10:00 30mCoffee break | Break Catering | ||
10:30 - 11:30 | |||
10:30 15mTalk | ICST 2027 Presentation ICST 2026 | ||
10:45 15mTalk | Most Influential Paper Award MIP Award | ||
11:00 30mTalk | Most Influential Paper Award Presentation MIP Award | ||
11:30 - 13:00 | |||
11:30 90mLunch | Lunch Catering | ||