Crossover Designs in Software Engineering Experiments: Review of the State of Analysis (ESEIW 2024 - ESEM Emerging Results, Vision and Reflection Papers Track)

Who

Julian Frattini, Davide Fucci, Sira Vegas

Track

ESEIW 2024 ESEM Emerging Results, Vision and Reflection Papers Track

Time Zone

The program is currently displayed in (GMT+02:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+02:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 24 Oct 2024 12:15 - 12:30 at Telensenyament (B3 Building - 1st Floor) - Empirical research methods Chair(s): Stefan Wagner

Abstract

Experimentation is an essential method for causal inference in any empirical discipline. Crossover-design experiments are common in Software Engineering (SE) research. In these, subjects apply more than one treatment in different orders. This design increases the amount of obtained data and deals with subject variability but introduces threats to internal validity like the learning and carryover effect. Vegas et al. (2015) reviewed the state of practice for crossover designs in SE research and provided guidelines on how to address its threats during data analysis while still harnessing its benefits. In this paper, we reflect on the impact of these guidelines and review the state of analysis of crossover design experiments in SE publications between 2015 and 2024. To this end, by conducting a forward snowballing of the guidelines, we survey 136 publications reporting 67 crossover-design experiments and evaluate their data analysis against the provided guidelines. The results show that the validity of data analyses has improved compared to the original state of analysis. Still, despite the explicit guidelines, only 29.5% of all threats to validity were addressed properly. While the maturation and the optimal sequence threats are properly addressed in 35.8% and 38.8% of all studies in our sample respectively, the carryover threat is only modeled in about 3% of the observed cases. The lack of adherence to the analysis guidelines threatens the validity of the conclusions drawn from crossover design experiments.

Link to Publication

https://doi.org/10.1145/3674805.3690754

Link to Preprint

https://arxiv.org/abs/2408.07594

DOI

https://doi.org/10.1145/3674805.3690754

Julian Frattini

Blekinge Institute of Technology

Sweden

Davide Fucci

Blekinge Institute of Technology

Sweden

Sira Vegas

Universidad Politecnica de Madrid

Spain

Replication Package