Fri 13 May 2022 12:20 - 12:25 at ICSE room 3-even hours - Release Engineering and DevOps 3 Chair(s): Diego Fontdevila
In principle, continuous integration (CI) practices allow modern software organizations to build and test their products after each code change to detect quality issues as soon as possible. In reality, issues with the build scripts (e.g., missing dependencies) and/or the presence of ‘‘flaky tests’’ lead to build failures that essentially are false positives, not indicative of actual quality problems of the source code. For our industrial partner, which is active in the video game industry, such ‘‘brown builds’’ not only require multidisciplinary teams to spend more effort interpreting or even re-running the build, leading to substantial redundant build activity, but also slows down the integration pipeline. Hence, this paper aims to prototype and evaluate approaches for early detection of brown build results based on textual similarity to build logs of prior brown builds. The approach is tested on 7 projects (6 closed-source from our industrial collaborators and 1 open-source, Graphviz). We find that our model manages to detect brown builds with a mean F1-score of 53% on the studied projects, which is three times more than the best baseline considered, and at least as good as human experts (but with less effort). Furthermore, we found that cross-project prediction can be used for a project’s onboarding phase, that a training set of 30-weeks works best, and that our retraining heuristics keep the F1-score higher than the baseline, while retraining only every 4-5~weeks.