When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications (FSE 2025 - Industry Papers)

Who

Sören Henning, Adriano Vogel, Esteban Pérez Wohlfeil, Otmar Ertl, Rick Rabiser

Track

FSE 2025 Industry Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 23 Jun 2025 11:30 - 11:50 at Vega - Performance Chair(s): Philipp Leitner

Abstract

Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems. While cloud environments offer several advantages for running benchmarks, it is often reported that benchmark results can vary significantly between repetitions—making it difficult to draw reliable conclusions about real-world performance.

In this paper, we empirically quantify the impact of cloud performance variability on benchmarking results, focusing on stream processing applications as a representative type of data-intensive, performance-critical system. In a longitudinal study spanning more than three months, we repeatedly executed an application benchmark used in research and development at Dynatrace. This allows us to assess various aspects of performance variability, particularly concerning temporal effects. With approximately 591 hours of experiments, deploying 789 Kubernetes clusters on AWS and executing 2366 benchmarks, this is likely the largest study of its kind and the only one addressing performance from an end-to-end, i.e., application benchmark perspective.

Our study confirms that performance variability exists, but it is less pronounced than often assumed (coefficient of variation of < 3.7%). Unlike related studies, we find that performance does exhibit a daily and weekly pattern, although with only small variability (≤ 2.5%). Re-using benchmarking infrastructure across multiple repetitions introduces only a slight reduction in result accuracy (≤ 2.5 percentage points). These key observations hold consistently across different cloud regions and machine types with different processor architectures. We conclude that for engineers and researchers focused on detecting substantial performance differences (e.g., > 5%) in their application benchmarks, which is often the case in software engineering practice, performance variability and the precise timing of experiments are far less critical.

Link to Preprint

https://arxiv.org/abs/2504.11826

DOI

https://doi.org/10.1145/3696630.3728563

Sören Henning

Dynatrace Research

Austria

Adriano Vogel

Esteban Pérez Wohlfeil

Dynatrace Research

Otmar Ertl

Dynatrace Research

Rick Rabiser

LIT CPS, Johannes Kepler University Linz

Austria

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 23 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:30	PerformanceDemonstrations / Research Papers / Ideas, Visions and Reflections / Journal First / Industry Papers at Vega Chair(s): Philipp Leitner Chalmers \| University of Gothenburg

10:30 20m Talk		Accuracy Can Lie: On the Impact of Surrogate Model in Configuration Tuning Journal First Pengzhou Chen University of electronic science and technology of China, Jingzhi Gong University of Leeds, Tao Chen University of Birmingham
10:50 20m Talk		Understanding Debugging as Episodes: A Case Study on Performance Bugs in Configurable Software Systems Research Papers Max Weber Leipzig University, Alina Mailach Leipzig University, Sven Apel Saarland University, Janet Siegmund Chemnitz University of Technology, Raimund Dachselt Technical University of Dresden, Norbert Siegmund Leipzig University DOI
11:10 20m Talk		Towards Understanding Performance Bugs in Popular Data Science Libraries Research Papers Haowen Yang The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Zhengda Li The Chinese University of Hong Kong, Shenzhen, Zhiqing Zhong The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Xiaoying Tang hinese University of Hong Kong, Shenzhen, Pinjia He Chinese University of Hong Kong, Shenzhen DOI
11:30 20m Talk		When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications Industry Papers Sören Henning Dynatrace Research, Adriano Vogel , Esteban Pérez Wohlfeil Dynatrace Research, Otmar Ertl Dynatrace Research, Rick Rabiser LIT CPS, Johannes Kepler University Linz DOI Pre-print
11:50 10m Talk		LitmusKt: Concurrency Stress Testing for Kotlin Demonstrations Denis Lochmelis Constructor University Bremen, JetBrains Research, Evgenii Moiseenko JetBrains Research, Yaroslav Golubev JetBrains Research, Anton Podkopaev JetBrains Research, Constructor University DOI Pre-print
12:00 10m Talk		Breaking the Loop: AWARE is the New MAPE-K Ideas, Visions and Reflections Brell SANWOUO Univ. Lille / INRIA, Clément Quinton University of Lille, Paul Temple IRISA
12:10 20m Talk		COFFE: A Code Efficiency Benchmark for Code Generation Research Papers Yun Peng The Chinese University of Hong Kong, Jun Wan Zhejiang University, Yichen LI The Chinese University of Hong Kong, Xiaoxue Ren Zhejiang University DOI

Information for Participants

Mon 23 Jun 2025 10:30 - 12:30 at Vega - Performance Chair(s): Philipp Leitner

Info for room Vega:

Vega is close to the registration desk.

Facing the registration desk, its entrance is on the left, close to the hotel side entrance.