The following two tutorials will be offered. Click to jump to the details:
-
Tutorial 1: Beyond A/B Testing - Orthogonal and Experimental Designs
-
Tutorial 2: How (not) to Analyze Software Engineering Experiments: From Anti-Patterns to Solutions
Tutorial 1: Beyond A/B Testing - Orthogonal and Experimental Designs
Presenters: Fabio Massacci (University of Trento, Italy), Aurora Papotti (Vrije Universiteit, Netherlands), Katja Tuma (Eindhoven University of Technology, Netherlands)
Abstract
Designing A/B testing experiments is relatively easy if you have plenty of subjects. What if you have many more design knobs than just two conditions or can only ask relatively few developers/users to test your solutions? This tutorial will help ICSE attendees to gain a hands-on understanding of designing combinatorial experiments in Software Engineering. It will be useful for PhD students, researchers and industrial practitioners with all levels of experience, including those conducting experimental studies with human subjects for the first time. The tutorial is based on the significant experience of the research group and will draw on plenty of examples. The tutorial includes interactive hands-on exercises with the learning objectives of designing, conducting and analyzing one’s own experimental study. The scientific underpinnings are partly available in the paper “Addressing Combinatorial Experiments and Scarcity of Subjects by Provably Orthogonal and Crossover Experimental Designs”(JSS’24).
Target audience
- The tutorial will be useful for researchers, both from academia and industry, conducting all types of experimental studies.
- It will be suitable to researchers with all levels of experience with designing experiments, including those looking to attempt it for the first time. This can include Masters (by research), PhD students, and new and experienced researchers in academia and industry.
- Some basic exposure to research is expected, but not necessary in DoE. There are no prerequisites for this tutorial except some basic knowledge of Excel and Python.
Learning objectives
- Understanding the limitations of full factorial designs in the context of software and security engineering experiments.
- Identifying when and how to apply orthogonal balanced designs to reduce the number of experimental configurations while maintaining statistical validity.
- Learning how to construct and use crossover balanced designs by re-using subjects across different scenarios, while mitigating bias and learning effects.
- Applying statistical reasoning to evaluate the impact of these designs on means, variances, and significance testing.
- Gaining familiarity with the artefacts and computational rules provided (e.g., Excel/Colab tools) for plug-and-play adoption in experimental practice.
- Exploring examples from software and security engineering to see how these designs improve feasibility and result quality in real-world studies.
Hands-on Exercises
- The tutorial will include hands-on experience with designing experiments in the context of security software engineering.
- We will present an industry case scenario and the attendees will have the opportunity to discuss in groups of three or four people, a suitable design of experiment to address the problem presented by applying the concept learned in the first part of the tutorial. Then, we will engage the audience to also carefully think about which material is necessary as support to carry on their experimental study.
- We will then propose a solution to the problem presented by showing data that we collected when conducting our own study, and the attendees will have the opportunity to work with these data and learn how to perform data analysis when it comes to conducting experimental studies with human subjects. Laptop with any tool supporting Excel files, and Google Colab files is required for the hand-on exercises.
Outline
- An Introduction to Design of Experiments
- Terminology and notation used in the field of DoE
- Presentation of three different types of design of experiments: full factorial, balanced (orthogonal and crossover)
- Introduction of a case study in the domain of security engineering for the hands-on exercise sessions
- Our artefact implemented to assist researchers in designing balanced experiments
- Our solution to solve the problem presented in the case study introduced earlier
- Material necessary to carry out an experimental study
- Understanding and analyzing data collected in an experimental study
- Question and Answers, to address attendee queries throughout and at the end
Bios of the presenters
Fabio Massacci is a professor at the University of Trento, IT and Vrije Universiteit, NL. He received a Ph.D. in computing from the University of Rome “La Sapienza.” Fabio Massacci received the IEEE Requirements Engineering Conference Ten Year Most Influential Paper Award on security in sociotechnical systems for which he carried several experiments with students and practitioners. He is named co-author of the recently released industry standard CVSS (Common Vulnerability Scoring System) v4.0. Fabio has also designed and performed with his collaborators a number of experiments with students and practitioners on CVSS scoring. He coordinated several EU projects and he is currently leading the Horizon Europe Sec4AI4Sec (Security for AI Augmented Systems). This tutorial is the ‘all you wanted to know’ version of a MSc course on experimental design he has been teaching at UTrento and VU Amsterdam for the past few years. He is a Member of IEEE, the ACM, and the Society for Risk Analysis. Contact him at fabio.massacci@ieee.org.
Aurora Papotti is a Ph.D. student at Vrije Universiteit, NL. She attended the EIT Digital Master School in Cyber Security and received her joint master’s degree in informatics with distinction from the University of Turku, FI, and the University of Trento, IT. Her main research interest is the experimental assessments of tools that automatically detect and fix vulnerabilities, and how developers cope with these tools while using them to code review. She has worked as teaching assistant for the course on experimental design at VU Amsterdam for the past two years. Contact her at a.papotti@vu.nl
Katja Tuma is an Assistant Professor at Eindhoven University of Technology within the SET cluster. She obtained her Ph.D. in Computer Science and Engineering from the University of Gothenburg. She is co-founder of the Dutch national working group on AI for security and security for AI, co-organizer of the international workshop DeMeSSAI, and co-founder of Hack4Her, a women-focused hackathon. She has developed trainings and lead several experimental projects in software engineering, security and human aspects. She is a Member of IEEE. Contact her at: k.tuma@tue.nl
Tutorial 2: How (not) to Analyze Software Engineering Experiments: From Anti-Patterns to Solutions
Presenter: Sira Vegas, Universidad Politécnica de Madrid, Madrid, Spain
Abstract
Experimentation is a key aspect of science and engineering, yet it remains one of the major stumbling blocks in software engineering. Although many experiments are conducted today, ensuring their quality—whether they involve human subjects or not—remains a persistent concern for the trustworthiness of results. Researchers have raised concerns about the correct use of statistical methods for many years, these issues often persist due to two main factors: the inherent complexity of empirical studies in our field, and the unique characteristics of software engineering, which lead to some experimentation issues being conceived differently than in other disciplines. This tutorial focuses on the analysis of experimental data, helping participants avoid common pitfalls and anti-patterns while improving the quality and reliability of their results. It is not intended as a data analysis course, but rather reviews key issues identified in published software engineering experiments, providing guidance based on over 25 years of experience running experiments.
Target audience
This tutorial focuses on methodological aspects. Therefore, the primary audience is researchers (ranging from PhD students to early career researchers to senior researchers). However, industrial participants with experience in, or interest in running experiments, may also find it relevant.
Learning objectives
- Understand why inferential statistics are essential for valid SE experiments.
- Learn to select appropriate data analysis techniques.
- Recognize and correctly include important design variables in the analysis.
- Know how to handle situations when data do not meet the assumptions of a parametric test.
- Understand how to interpret experimental results in terms of statistical significance, effect size, and power.
Outline
-
(1) The need for inferential statistics.
SE experiments sometimes lack inferential statistics, which can seriously undermine the validity of the results. The tutorial will discuss why they are necessary (30 minutes) -
(2) Choosing the right data analysis technique.
Data analysis is driven by the experimental design, and the selected technique should be determined accordingly. However, the choice is not straightforward, as several issues must be taken into consideration. These issues will be discussed (30 minutes). -
(3) Accounting for design variables.
Care must be taken to include all relevant design variables in the analysis. The tutorial will focus on two designs that are often improperly analyzed: blocked and crossover designs (30 minutes) -
(4) When parametric assumptions fail.
Parametric tests are more powerful than non-parametric alternatives and are commonly used to analyze factorial experiments. However, data do not always satisfy the assumptions required by these tests. The tutorial will discuss the options available when assumptions are violated, including data transformations and non-parametric tests (30 minutes) -
(5) The 3 musketeers: statistical significance, effect size, and power.
The tutorial will cover the meaning and implications of these three parameters, how they relate to each other, and how they should be used to properly interpret experimental results. For example, non-significant results may be due to low power, while statistically significant results may be practically irrelevant when effect sizes are small. (30 minutes)
Bios of the presenters
Sira Vegas is a Full Professor at the Universidad Politécnica de Madrid. Her main research interests are experimental software engineering and software testing. She has been General Co-Chair of EASE’23, Program Co-Chair of ESEM’07, and a member of its SC (2006-08). Sira is the Steering Committee Chair of the International Software Engineering Research Network (ISERN) since 2023. She has been Program co-chair of the RENE track of ICPC’24 and SANER’24, and Program Chair of the Journal-First track of PROFES’23. Sira has been Program co-Chair of ICSE-DS’21 and participated in the organization of CSEE&T’03. She has been a PC member of different ICSE tracks since 2012 (research, NIER, SEET, SEIP, DS, Demos, Artifacts, and SRC), and other conferences such as ESEM (since 2008), ASE (since 2021), and MSR (since 2023). She is a regular reviewer of IEEE Transactions on Software Engineering since 2011 and was a member of the review board (2019-2024). Sira is also a regular reviewer of the Empirical Software Engineering Journal since 2006.
Sira Vegas has been teaching data analysis in the “Experimental Software Engineering” course of the European Master in Software Engineering for over 15 years. She is a co-author of the book chapter “A Course on Experimentation in Software Engineering: Focusing on Doing” in the Handbook on Teaching Empirical Software Engineering (Springer).