ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada
Mon 28 Apr 2025 11:00 - 12:00 at 104 - Keynote 2 and Paper Presentations 1 Chair(s): Vincenzo Riccio

Benchmarks are our measure of progress. Or are they?

How do we know how well our tool solves a problem, like bug finding, compared to other state-of-the-art tools? We run a benchmark. We choose a few representative instances of the problem, define a reasonable measure of success, and identify and mitigate various threats to validity. Finally, we implement (or reuse) a benchmarking framework, and compare the results for our tool with those for the state-of-the-art.

For many important software engineering problems, we have seen new sparks of interest and serious progress made whenever a (substantially better) benchmark became available. Benchmarks are our measure of progress. Without them, we have no empirical support to our claims of effectiveness. Yet, time and again, we see practitioners disregard entire technologies as “paper-ware”—far from solving the problem they set out to solve.

In this keynote, I will discuss our recent efforts to systematically study the degree to which our evaluation methodologies allow us to measure those capabilities that we aim to measure. We shed new light on a long-standing dispute about code coverage as a measure of testing effectiveness, explore the impact of the specific benchmark configuration on the evaluation outcome, and call into question the actual versus measured progress of an entire field (ML4VD) just as it gains substantial momentum and interest.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
Keynote 2 and Paper Presentations 1SBFT at 104
Chair(s): Vincenzo Riccio University of Udine
11:00
60m
Keynote
Keynote by Marcel Böhme
SBFT
Marcel Böhme MPI for Security and Privacy
12:00
15m
Research paper
DeepUIFuzz: A Guided Fuzzing Strategy for Testing UI Component Detection Models
SBFT
Proma Chowdhury University of Dhaka, Kazi Sakib Institute of Information Technology, University of Dhaka
12:15
15m
Research paper
On Evaluating Fuzzers with Context-Sensitive Fuzzed Inputs: A Case Study on PKCS#1-v1.5
SBFT
S Mahmudul Hasan Syracuse University, Polina Kozyreva Syracuse University, Endadul Hoque Syracuse University