Software Batch Testing to Save Build Test Resources and to Reduce Feedback Time
Thu 12 May 2022 22:05 - 22:10 at ICSE room 2-even hours - Release Engineering and DevOps 2 Chair(s): Xin Peng
Testing is expensive and batching tests has the potential to reduce test costs. The continuous integration strategy of testing each commit or change individually helps to quickly identify faults but leads to a maximal number of test executions. Large companies that have a massive number of commits, e.g., Google and Facebook, or have expensive test infrastructure, e.g., Ericsson, must batch changes together to reduce the number of total test runs. For example, if eight builds are batched together and there is no failure, then we have tested eight builds with one execution saving seven executions. However, when a failure occurs it is not immediately clear which build is the cause of the failure. A bisection is run to isolate the failing build, i.e. the culprit build. In our eight builds example, a failure will require an additional 6 executions, resulting in a saving of one execution.
In this work, we re-evaluate batching approaches developed in industry on large open source projects using Travis CI. We also introduce novel batching approaches. In total, we evaluate six approaches. The first is the baseline approach that tests each build individually. The second, is the existing bisection approach. The third uses a batch size of four, which we show mathematically reduces the number of execution without requiring bisection. The fourth combines the two prior techniques introducing a stopping condition to the bisection. The final two approaches use models of build change risk to isolate risky changes and test them in smaller batches. We find that compared to the TestAll baseline, on average, the approaches reduce the number of build test executions across projects by 46%, 48%, 50%, 44%, and 49% for BatchBisect, Batch4, BatchStop4, RiskTopN, and RiskBatch, respectively. The greatest reduction in executions is BatchStop4 at 50%. However, the simple approach of Batch4 does not require bisection and achieves a reduction of 48%. In a larger sample of projects, we find that a project’s failure rate is strongly correlated with execution savings (Spearman r = −0.97 with a p << 0.001). Using Batch4, 85% of projects see savings. All projects that have build failures less than 40% of the time will benefit from batching. In terms of feedback time, compared to TestAll, we find that BatchBisect, Batch2, Batch4, BatchStop4 all reduce the average feedback time by 33%, 16%, 32%, and 37%. Simple batching saves not only resources but also reduces feedback time without introducing any slip-throughs and without changing the test run order.
We suggest that most projects should adjust their CI pipelines to use a batch size of at least two. We release our scripts and data for replication as well as the BatchBuilder tool that automatically batches submitted changes on GitHub for testing on Travis CI. Since the tool reports individual results for each pull-request or pushed commit, the batching happens in the background and the development process is unchanged.
Thu 12 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:00 | Software Testing 13Journal-First Papers / Technical Track at ICSE room 4-odd hours Chair(s): Peter C. Rigby | ||
11:00 5mTalk | Software Batch Testing to Save Build Test Resources and to Reduce Feedback Time Journal-First Papers Mohammad Javad Beheshtian Concordia University, Amir Bavand Concordia University, Peter Rigby Concordia University, Montreal, Canada Link to publication DOI Media Attached | ||
11:05 5mTalk | A Family of Experiments on Test-Driven Development Journal-First Papers Adrian Santos Parrilla University of Oulu, Sira Vegas Universidad Politecnica de Madrid, Oscar Dieste Universidad Politécnica de Madrid, Fernando Uyaguari ETAPA Telecommunications Company, Ayse Tosun Istanbul Technical University, Davide Fucci Blekinge Institute of Technology, Burak Turhan University of Oulu, Giuseppe Scanniello University of Basilicata, Simone Romano University of Bari, Itir Karac University of Oulu, Marco Kuhrmann Reutlingen University, Vladimir Mandić Faculty of Technical Sciences, University of Novi Sad, Robert Ramač Faculty of Technical Sciences, University of Novi Sad, Dietmar Pfahl University of Tartu, Christian Engblom Ericsson, Jarno Kyykka Ericsson, Kerli Rungi Testlio, Carolina Palomeque ETAPA Telecommunications Company, Jaroslav Spisak PAF, Markku Oivo University of Oulu, Natalia Juristo Universidad Politecnica de Madrid Link to publication DOI Pre-print Media Attached | ||
11:10 5mTalk | Prioritizing Mutants to Guide Mutation Testing Technical Track Samuel Kaufman University of Washington, Ryan Featherman University of Washington, Justin Alvin University of Massachusetts Amherst, Bob Kurtz George Mason University, USA, Paul Ammann George Mason University, USA, René Just University of Washington DOI Pre-print Media Attached | ||
11:15 5mTalk | Automated Testing of Software that Uses Machine Learning APIs Technical Track Chengcheng Wan The University of Chicago, Shicheng Liu University of Chicago, Sophie Xie University of California, Berkeley, Yifan Liu University of Chicago, Henry Hoffmann University of Chicago, Michael Maire University of Chicago, Shan Lu University of Chicago Pre-print Media Attached | ||
11:20 5mTalk | CONFETTI: Amplifying Concolic Guidance for Fuzzers Technical Track James Kukucka George Mason University, Luís Pina University of Illinois at Chicago, Paul Ammann George Mason University, USA, Jonathan Bell Northeastern University Pre-print Media Attached | ||
11:25 5mTalk | On the Reliability of Coverage-Based Fuzzer Benchmarking Technical Track Marcel Böhme MPI-SP, Germany and Monash University, Australia, Laszlo Szekeres Google, Jonathan Metzman Google DOI Pre-print Media Attached |