Scalable Continuous Integration using Remote Execution (CCIW 2024)

Mon 27 - Fri 31 May 2024 Canada

Who

Ola Rozenfeld, Ulf Adams

Track

CCIW 2024

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 28 May 2024 09:40 - 10:05 at Room 5 - CCIW Session 1 Chair(s): Tim A. D. Henderson

Abstract

Continuous integration at scale is a common problem for any large project. Multiple dimensions need to be optimized in order to provide a consistent experience for a large set of users. In particular, cost and performance are the typical ends of the spectrum that a CI maintainer needs to balance. Namely, for a non-optimized CI setup it’s very tempting to get great performance by simply adding more machines to the fleet, however, maintaining those machines persistently running comes at a high infrastructure cost. Therefore it’s not a surprise that one of the key difficulties of a scalable CI is providing performant builds and tests at a reasonable and predictable cost.

At Engflow, we specialize in optimizing the build and test cycles for large scale organizations for both interactive user builds and CI workflows. A typical scenario is a company that has adopted GitHub actions as their CI solution where they’d have dozens or hundreds of workflows. The first step is to optimize the compilation, linking and test times by using the scalability of the cloud. This is accomplished by implementing https://github.com/bazelbuild/remote-apis which requires that the client machine (e.g. Github runner) to break down the steps in a declarative manner (note: this can be easily accomplished by any build system supporting the API such as Bazel). The server can then optimally allocate server resources to execute the workflows, namely, the server can decide to scale up or down the number of machines based on the incoming work.

In our most recent work, we’ve experimented with using the same remote execution infrastructure and optimal allocation to run the CI workflows. This essentially means that CI machines (e.g. Github action runners) can be autoscaled and become cost effective instead of needing to persistently run a set of machines for a given CI workload.

In this talk we want to talk about the remote execution implementation of CI runners, the advantages of that, show it can significantly save infrastructure costs and what path we recommend for migration.

Ola Rozenfeld

EngFlow Inc.

Canada

Ulf Adams