What Breaks Google? (CCIW 2023 - CI/CD Industry Workshop (CCIW) 2023)

Who

Avi Kondareddy, Abhayendra Singh, Tim A. D. Henderson

Track

CCIW 2023

Time Zone

The program is currently displayed in (GMT+01:00) Dublin.

Use conference time zone: (GMT+01:00) DublinSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 20 Apr 2023 12:00 - 12:30 at Macken - Session 2

Abstract

In an ideal Continuous Integration (CI) workflow, all potentially impacted builds/tests would be run before submission for every proposed change. In large-scale environments like Google’s mono-repository, this is not feasible to do in terms of both latency and compute cost, given the frequency of change requests and overall size of the codebase. Instead, the compromise is to run more comprehensive testing at later stages of the development lifecycle – batched after submission.

TAP (Test Automation Platform) – the main CI system at Google – is responsible for building and testing hermetic build/test “targets” (libraries, binaries, tests, etc) across the mono-repository. Targets are built/run both during individual pull requests before submission (“presubmit”) and over ranges of submitted commits (“postsubmit”). At presubmit time, TAP runs builds/tests that are directly relevant to the team/project corresponding to the change. Changes that have all presubmit runs passing are merged sequentially into the mainline branch. At postsubmit time, we periodically run all builds/tests at a change close to HEAD, that were potentially affected since the last test cycle.

Traditionally, developers are simply notified of breakages which they then need to root-cause and fix. Given a range of commits over which a target has broken, “culprit finding” refers to the task of finding the commit where the breakage is introduced (the “culprit”). This is further complicated by the prevalence of nondeterminism (“flakiness”) requiring multiple runs to confirm real breakages. Over the past several years, efficient automated culprit finders have been deployed to determine the “culprit” changes for all postsubmit breakages.

The availability of this labeled dataset of postsubmit breakages and attribution to their culprits open the possibility of predictive test selection at postsubmit time and motivate our exploratory analysis of what features of submitted code commits are predictive of changes which introduce code defects. To make testing more compute efficient and decrease the latency in discovering breakages introduced into the codebase, TAP Postsubmit is developing a new scheduling algorithm that utilizes Bug Prediction metrics, features from the change, and historical information about the targets to predict the likelihood of a target being broken by a change. Using these predictions, small subsets of targets at increased risk may be scheduled more frequently to more quickly uncover breakages. This work will examine the association between some of our selected features with culprits in the Google codebase.

Previous work at Google and elsewhere has found text execution history and coarse grain code metrics to be useful for test selection. Now, culprit finding allows us to perform this analysis at the individual commit level instead of postsubmit cycle granularity. We have looked more closely at other features viably accessible in real time such as tokens within the change description and the build graph distance between targets and files within the commit, and found that many are fairly predictive features of code introducing breakages. For example, the presence of individual tokens in the change description plus simple metrics such as LOC can capture 98% of culprit changes while filtering out 30% of safe changes. In this paper, we present our results for the mentioned features and more and the implied resource savings if used for test selection at Google.

Avi Kondareddy

Google LLC

United States

Abhayendra Singh

Google LLC

Tim A. D. Henderson

Google LLC

United States

Time Zone

The program is currently displayed in (GMT+01:00) Dublin.

Use conference time zone: (GMT+01:00) DublinSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 20 Apr
Displayed time zone: Dublin change

11:00 - 12:30	Session 2CCIW at Macken

11:00 30m Talk		How we use Hermetic, Ephemeral Test Environments at Google to reduce Test Flakiness CCIW Carlos Arguelles Google LLC
11:30 30m Talk		Enabling Pre-Merge CI on your TV CCIW Jose Soltren Roku, Kyle Mulligan Roku
12:00 30m Talk		What Breaks Google? CCIW Avi Kondareddy Google LLC, Abhayendra Singh Google LLC, Tim A. D. Henderson Google LLC