Wed 25 May 2016 16:30 - 17:00 at Snijderszaal - Session 4 Chair(s): Sebastian Erdweg

Experimental evaluation is key to systems research. Because modern systems are complex and non-deterministic, good experimental methodology demands that researchers account for uncertainty. To obtain valid results, they are expected to run many iterations of benchmarks, invoke virtual machines (VMs) several times, or even rebuild VM or benchmark binaries more than once. All this repetition costs time to complete experiments. Currently, many evaluations give up on sufficient repetition or rigorous statistical methods, or even run benchmarks only in training sizes. The results reported often lack proper variation estimates and, when a small difference between two systems is reported, some are simply unreliable.

In contrast, we provide a statistically rigorous methodology for repetition and summarising results that makes efficient use of experimentation time. Time efficiency comes from two key observations. First, a given benchmark on a given platform is typically prone to much less non-determinism than the common worst-case of published corner-case studies. Second, repetition is most needed where most uncertainty arises (whether between builds, between executions or between iterations). We capture experimentation cost with a novel mathematical model, which we use to identify the number of repetitions at each level of an experiment necessary and sufficient to obtain a given level of precision.

Wed 25 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:30 - 17:00
Session 4Workshop at Snijderszaal
Chair(s): Sebastian Erdweg TU Delft
15:30
30m
Talk
Resilient and Elastic APGAS
Workshop
Olivier Tardieu IBM Research
16:00
30m
Talk
A gradual typing throwdown
Workshop
Jan Vitek Northeastern University
16:30
30m
Talk
Rigorous Benchmarking in Reasonable Time
Workshop
Richard Jones University of Kent, Tomas Kalibera Northeastern University