Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia

Mahtab Nejati*, Mahmoud Alfadel, Shane McIntosh

{mahtab.nejati,malfadel,shane.mcintosh}@uwaterloo.ca

Replication package available at https://doi.org/10.5281/zenodo.7042929

Paper available at https://rebels.cs.uwaterloo.ca/papers/icse2023_nejati.pdf

Introduction

Build systems automate the integration of source code into executables. Maintaining build systems is known to be challenging. Lax build maintenance can lead to slow builds, costly build breakages, or unexpected software behaviour. Code review is a broadly adopted practice to improve software quality. Yet, little is known about how code review is applied to build specifications.

In our study, we present the first empirical study of how code review is practiced in the context of build specifications. In a mixed-methods study, we applied both quantitative (repository mining) and qualitative empirical methods (closed coding, open coding, card sorting, and semi-structured interviews) to study the code review of build specifications that has been applied by the Qt and Eclipse communities.

Through quantitative analysis of 502,931 change sets from the Qt and Eclipse communities, we observe that changes to build specifications are at least two times less frequently discussed during code review when compared to production and test code changes. A qualitative analysis of 500 change sets reveals that (i) comments on changes to build specifications are more likely to point out defects than rates reported in the literature for production and test code, and (ii) evolvability and dependency-related issues are the most frequently raised patterns of issues. Follow-up interviews with nine developers with 1-40 years of experience point out social and technical factors that hinder rigorous review of build specifications, such as a prevailing lack of understanding of and interest in build systems among developers, and the lack of dedicated tooling to support the code review of build specifications.

Replication Package Content

This replication package enables the validation of our quantitative and qualitative analyses artifacts, as well as future replication and expansion of the study in subsequent research. In particular, the package contains the following:

a) The means to replicate the quantitative analyses results (RQ1): The replication package allows users to measure the prevalence and intensity of the code review discussions in different file types with metrics such as the ratio of modified files with comments, the average length of discussions in terms of the number of comments, the average length of comments, and the average number of reviewers. Moreover, the content of this section enables users to test the statistical significance of the differences in the objective metrics, and the odds ratios of receiving review comments in different file types when compared to each other.

The replication package includes the means necessary to conduct all the steps, including data collection, data cleaning, and quantitative analyses. Moreover, the documentation is thorough enough to support the reuse and repurposing of the scripts for future studies. However, to facilitate the evaluation of the analyses, we provide the cleaned data and a Docker image of the environment, with pertinent instructions on how to run the analyses in the Docker container. This is equivalent to skipping the data collection and data cleaning steps, which we find time- and resource-consuming, and directly conducting the analyses and measuring the objective metrics.

b)The sampled review comments for the qualitative analyses and their assigned labels (RQ2 and RQ3): The replication package also includes the complete list of labelled review comments for both comment purposes and issue patterns. The data is stored in a Microsoft Excel spreadsheet file with the statistics on the labels stored in different sheets.

c)The interview protocol (RQ4): Our semi-structured interview protocol is also provided in the replication package. However, the transcribed interviews are omitted for confidentiality concerns. Nevertheless, researchers can follow similar interview structures and protocols to replicate similar studies.

Technical Assumptions

We assume a basic understanding of Docker and the ability to install the technology on either an arm64 or amd64 machine architecture, and the ability to follow simple instructions in the Bash command line for evaluation purposes. However, if the user aims to replicate the whole study and/or repurpose it, an average knowledge of JupyterLab, Python, and R is assumed. The results of the qualitative analyses are presented in Microsoft Excel sheets and incorporate formulas.

Download Information

To download the replication package, visit our online appendix at https://doi.org/10.5281/zenodo.7042929. Complete instructions on how to run the analyses are provided in the package. A pre-print of our study is also available online at https://rebels.cs.uwaterloo.ca/papers/icse2023_nejati.pdf.

Badges Claimed

We claim the Artifacts Available badge as we have made our replication package publicly available. We also claim the Artifacts Evaluated-Reusable as we have provided detailed documentation on the step-by-step execution of the study. All required scripts and data are available. All outputs from which the results in the paper have been compiled are also included and documented. The replication package supports easy execution of the analyses for verification of the artifact, as well as a replicable process from scratch. The documentation not only walks users through the execution process but also points out where modifications should be applied for future reuse and repurposing of the artifacts on different subject projects (see the data cleaning step).