Predicting Compiler Resource Utilization (CCIW 2023 - CI/CD Industry Workshop (CCIW) 2023)

Who

Marc Türke, Bartosz Bogacz, Eric Heim, Torsten Mandel, Stephan Kraft

Track

CCIW 2023

Time Zone

The program is currently displayed in (GMT+01:00) Dublin.

Use conference time zone: (GMT+01:00) DublinSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 20 Apr 2023 14:30 - 15:00 at Macken - Session 3

Abstract

Any software development activity, both small and large, benefits from fast turn-around times. Our unique distributed compilation framework at SAP enables us to build large projects, such as HANA, with usage of all available resources. The key goal of our proposed method is to reduce build times, speedup the development cycle and cut hardware costs by efficiently using our infrastructure. Compile jobs are by their nature complex non-linear graph transformation tasks that do not have a predictable memory usage and time consumption. This can lead to unpredictable memory pressure causing out-of-memory situations. To address this issue, we present a machine learning based method to predict the memory consumption of compile jobs and fully max out the number of parallel compile jobs on the available hardware.

Typically, the number of compile jobs per host is manually tuned based on expert domain knowledge. The maximal memory consumption per job is determined once for all compile jobs and the number of jobs per host is derived from the available CPU cores and installed memory. Such an static approach necessarily both over- and underestimates the true memory usage, and can’t dynamically adapt if the memory requirements change.

The hypothesis of our work is that the true memory usage depends on the source code in the files being compiled. Based on that hypothesis, we present a novel CI/CD task to predict the memory consumption of compile jobs solely depending on the content of the source files. Furthermore, we use this information to schedule the maximal amount of parallel jobs that which allows us to reliably utilize our hardware to the fullest.

The driving constraints of our approach are to develop an understandable, observable, compute efficient and portable learning pipeline that can be integrated in our existing distributed compilation framework. For this purpose, we focus on an approach involving extracting token n-grams, weighting those with term-frequency inverse-document-frequency (TFIDF) and a multinomial Bayesian classifier.

Memory usage is a continuous target that necessitates us to divide it into discrete bins if we want to use a Bayesian classifier. We discretize its value into five classes, and predict the target memory class for each source file. The learned parameters of our pipeline are thus probabilities indicating which tokens contribute most to which target memory class.

In order to prevent out-of-memory crashes it is crucial to identify source files with a high memory consumption during compilation. We assign source files having the largest memory usage (of 11GB) to their correct memory class with an accuracy of 89%.

Marc Türke

SAP SE

Germany

Bartosz Bogacz

SAP SE

Germany

Eric Heim

SAP SE

Germany

Torsten Mandel

SAP SE

Germany

Stephan Kraft

SAP SE

Germany

Time Zone

The program is currently displayed in (GMT+01:00) Dublin.

Use conference time zone: (GMT+01:00) DublinSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 20 Apr
Displayed time zone: Dublin change

14:00 - 15:30	Session 3CCIW at Macken

14:00 30m Talk		Cloud-native Continuous Integration system for large enterprise software projects CCIW Stephan Kraft SAP SE, Christoph Heer SAP
14:30 30m Talk		Predicting Compiler Resource Utilization CCIW Marc Türke SAP SE, Bartosz Bogacz SAP SE, Eric Heim SAP SE, Torsten Mandel SAP SE, Stephan Kraft SAP SE
15:00 30m Talk		Automating Combinatorial Test Coverage for Cloud Virtualization CCIW Matt Kenison Google, Mike Meade Google, Justin Bagwell Google, Jane Quichocho Google File Attached