TITLE: Analytical Observational Studies in Software Engineering

DURATION: Full day

DESCRIPTION:

Running Mining Software Repositories (MSR) studies has become popular over the years. However, as MSR studies use observational data, they lack the required level of control to identify causality. Consequently, papers report correlational results—in the best case—and state that causality cannot be revealed. This is a major limitation for MSR studies. In other disciplines this issue has been addressed by developing strategies that replacing control by choice, are able to come closer to revealing causality in observational studies. The objective of the school is to introduce the participants to Analytical Observational Studies (AOS). This type of studies can also be applied in the context of MSR. After the school, the participants should be able to conduct an AOS following the steps and techniques taught.

The school will explore the concepts of correlation and causation, along with the requirements that a study should meet to be able to identify causality. Special attention will be given to the concept of extraneous variables, as it is central in running AOS. The main content of the school is how to design and analyze an AOS.

The school consists of lectures and an illustrative example that will be run in parallel. The lectures introduce the concepts while in the illustrative example the participants are divided into smaller groups in which they apply the introduced concepts in a real MSR study. The goal is to give the participants ability to apply AOS in their own research.


Aims and Objectives

The school aims to incorporate the study of cause-effect relationships in the Mining Software Repositories (MSR) field by learning to run Analytical Observational Studies (AOS). This aim will be achieved by means of three objectives:

  • O1. Understand the difference between correlation and causation, and its implications.
  • O2. Learn how AOS can be used in MSR.
  • O3: Apply AOS methods to MSR studies.

The school will have an important practical component.


Outline of the covered topics

The school will be organized in the following way:

1. Introduction

Correlation is totally different from causation. The school starts by exploring their differences and discussing the principles to be met when trying to establish a causal relationship (temporal precedence, association, and non-spuriousness). It will be followed by recalling what controlled experiments in SE are; their distinctive features (control and causality); and the mechanisms they use to obtain causality: independent variable manipulation, local control, randomization and data replication (Several datapoints are needed since one single datapoint is not generalizable.)

Finally, some motivational examples are presented that compare MSR studies with controlled experiments and illustrate the current problems when studying causality in MSR studies. These examples will be used to present the aim of the school, along with its learning goals.

2. Observational Studies

Other empirical disciplines have developed methods to be able to study causal relationships from observational data. We will show the different types of studies used in medicine and will introduce the example of epidemiology, which uses a certain type of observational studies called analytical when looking for cause-effect relationships and it is not possible to run controlled experiments, and therefore an observational study is used.

Next, we will introduce a well-known success story of the use of AOS in epidemiology: the identification of smoking as a cause of the development of lung cancer. To identify this causal relationship experiments were never used (given the ethical impossibility of carrying them out, since it would imply asking a large group of non-smokers to start smoking).

3. Illustrative Example

A real example is presented. Participants will be asked to plan an AOS in MSR following the given steps. The example will be run in parallel with the description of the steps.

4. Methods in AOS for SE

We will start recalling the steps to conduct a controlled experiment in SE. From now on, for the steps in the experimental process, we will give an overview of the similarities and differences between experiments and AOS to later explore in detail their differences, and discuss the methods used by AOS.

4.1 Hypothesis Formulation

Both experiments and AOS examine broad theories in narrow, focused, controlled circumstances. However, unlike controlled experiments, the step from association to causation in AOS needs to be clarified by making theories elaborate.

4.2 Variables Selection and Instrumentation

The type of variables involved in an AOS are the same as in a controlled experiment: independent, dependent, and extraneous. Special attention will be paid to extraneous variables, as they particularly affect AOS. Extraneous variables (or third variables) are variables that the researcher is not investigating but can potentially affect the response variable of a study. We will discuss its importance and the different types that can be found. Finally, the data collection and measurement procedures need to be explained for each variable of interest.

4.3 Context and Subject Selection

We will introduce the aspects needed to describe the context of an AOS (setting, locations, and relevant dates). Next, we will focus on describing the different populations of interest. Finally, we will discuss the steps that have to be followed for selecting study subjects.

4.4 Design: Avoiding Extraneous Variables

Researchers might want to rule some extraneous variables out, by counteracting their effect. This implies that during analysis, it will not be possible to assess its effect. There are two techniques to do this: restriction and matching. Here, we explain how to perform them, and when to choose each one.

4.5 Analysis: Identifying Extraneous Variables

While the extraneous variables that have been restricted or matched do not have to be incorporated into the analysis, the remaining ones do have to. We will explain how this should be done. Additionally, AOS require that the influence of potential unmeasured extraneous variables (mainly unknown, but also possibly known ones that have been impossible to measure) on the causal conclusions is examined. This is named sensitivity analysis. We will discuss how to perform it.

4.6 Interpretation

Differently from experiments, during this stage, criticism must be exercised and competing theories need to be evaluated. This is of great importance.

4.7 Validity Evaluation

The list of potential validity threats defined in epidemiology for this type of study adapted to the SE context will be presented here.

5. Wrap-up

The main issues raised during the school will be briefly highlighted, and additional questions and concerns from the participants will be answered.

You're viewing the program in a time zone which is different from your device's time zone change time zone

Wed 25 Oct

Displayed time zone: Central Time (US & Canada) change

08:30 - 10:00
Introduction and Observational StudiesIASESE Advanced School at Oak Alley
08:30
45m
Other
Introduction
IASESE Advanced School
Sira Vegas Universidad Politecnica de Madrid, Davide Taibi University of Oulu and Tampere University
09:15
45m
Other
Observational Studies
IASESE Advanced School
Nyyti Saarimäki Tampere University, Sira Vegas Universidad Politecnica de Madrid
10:30 - 12:00
Illustrative example and Methods in AOS: hypothesis formulation, variables selection, instrumentationIASESE Advanced School at Oak Alley
10:30
45m
Other
Illustrative Example
IASESE Advanced School
Valentina Lenarduzzi University of Oulu, Davide Taibi University of Oulu and Tampere University
11:15
45m
Other
Methods in AOS: hypothesis formulation, variables selection, instrumentation
IASESE Advanced School
Sira Vegas Universidad Politecnica de Madrid, Nyyti Saarimäki Tampere University, Davide Taibi University of Oulu and Tampere University
13:30 - 15:00
Methods in AOS: context, subject selection, designIASESE Advanced School at Oak Alley
13:30
90m
Other
Methods in AOS: context, subject selection, design
IASESE Advanced School
Valentina Lenarduzzi University of Oulu, Sira Vegas Universidad Politecnica de Madrid, Nyyti Saarimäki Tampere University
15:30 - 16:30
Methods in AOS: analysis, interpretation, validity evaluationIASESE Advanced School at Oak Alley
15:30
60m
Other
Methods in AOS: analysis, interpretation, validity evaluation
IASESE Advanced School
Sira Vegas Universidad Politecnica de Madrid, Nyyti Saarimäki Tampere University
16:30 - 17:00
16:30
30m
Other
Wrap-up
IASESE Advanced School
Valentina Lenarduzzi University of Oulu, Nyyti Saarimäki Tampere University, Davide Taibi University of Oulu and Tampere University , Sira Vegas Universidad Politecnica de Madrid