Scalable Continuous Integration using Remote Execution
Continuous integration at scale is a common problem for any large project. Multiple dimensions need to be optimized in order to provide a consistent experience for a large set of users. In particular, cost and performance are the typical ends of the spectrum that a CI maintainer needs to balance. Namely, for a non-optimized CI setup it’s very tempting to get great performance by simply adding more machines to the fleet, however, maintaining those machines persistently running comes at a high infrastructure cost. Therefore it’s not a surprise that one of the key difficulties of a scalable CI is providing performant builds and tests at a reasonable and predictable cost.
At Engflow, we specialize in optimizing the build and test cycles for large scale organizations for both interactive user builds and CI workflows. A typical scenario is a company that has adopted GitHub actions as their CI solution where they’d have dozens or hundreds of workflows. The first step is to optimize the compilation, linking and test times by using the scalability of the cloud. This is accomplished by implementing https://github.com/bazelbuild/remote-apis which requires that the client machine (e.g. Github runner) to break down the steps in a declarative manner (note: this can be easily accomplished by any build system supporting the API such as Bazel). The server can then optimally allocate server resources to execute the workflows, namely, the server can decide to scale up or down the number of machines based on the incoming work.
In our most recent work, we’ve experimented with using the same remote execution infrastructure and optimal allocation to run the CI workflows. This essentially means that CI machines (e.g. Github action runners) can be autoscaled and become cost effective instead of needing to persistently run a set of machines for a given CI workload.
In this talk we want to talk about the remote execution implementation of CI runners, the advantages of that, show it can significantly save infrastructure costs and what path we recommend for migration.
Tue 28 MayDisplayed time zone: Eastern Time (US & Canada) change
08:30 - 10:30 | |||
08:30 20mDay opening | Welcome to CCIW CCIW Tim A. D. Henderson Google | ||
08:50 25mTalk | Thinktank: Leveraging LLM Reasoning for Advanced Task Execution in CI/CD CCIW Tim Keller SAP SE | ||
09:15 25mTalk | Widespread Error Detection in Large Scale Continuous Integration Systems CCIW Stanislaw Swierc Meta Platforms, Inc., James Lu Meta Platforms, Inc., Thomas Yi Meta Platforms, Inc. Link to publication | ||
09:40 25mTalk | Scalable Continuous Integration using Remote Execution CCIW | ||
10:05 25mTalk | Replay-Based Continual Learning for Test Case Prioritization CCIW Asma Fariha Ontario Tech University, Akramul Azim Ontario Tech University, Ramiro Liscano Ontario Tech University |