ICSE 2024
Fri 12 - Sun 21 April 2024 Lisbon, Portugal

GPUs are essential for accelerating Machine Learning (ML) workloads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each colocated job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e.,exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in convential sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5sc5AICY

Wed 17 Apr

Displayed time zone: Lisbon change

14:00 - 15:30
Dependability and Formal methods 1Software Engineering in Practice / Demonstrations / Research Track at Maria Helena Vieira da Silva
Chair(s): Domenico Bianculli University of Luxembourg
14:00
15m
Talk
REDriver: Runtime Enforcement for Autonomous Vehicles
Research Track
Yang Sun Singapore Management University, Chris Poskitt Singapore Management University, Xiaodong Zhang , Jun Sun Singapore Management University
Pre-print
14:15
15m
Talk
Scalable Relational Analysis via Relational Bound Propagation
Research Track
Clay Stevens Iowa State University, Hamid Bagheri University of Nebraska-Lincoln
DOI Pre-print
14:30
15m
Talk
Kind Controllers and Fast Heuristics for Non-Well-Separated GR(1) Specifications
Research Track
Ariel Gorenstein Tel Aviv University, Shahar Maoz Tel Aviv University, Jan Oliver Ringert Bauhaus-University Weimar
14:45
15m
Talk
On the Difficulty of Identifying Incident-Inducing Changes
Software Engineering in Practice
Eileen Kapel ING & Delft University of Technology, Luís Cruz Delft University of Technology, Diomidis Spinellis Athens University of Economics and Business & Delft University of Technology, Arie van Deursen Delft University of Technology
15:00
15m
Talk
Autonomous Monitors for Detecting Failures Early and Reporting Interpretable Alerts in Cloud Operations
Software Engineering in Practice
Adha Hrusto Lund University, Sweden, Per Runeson Lund University, Magnus C Ohlsson System Verification
15:15
7m
Talk
nvshare: Practical GPU Sharing without Memory Size Constraints
Demonstrations
Georgios Alexopoulos University of Athens, Dimitris Mitropoulos University of Athens
Pre-print
15:22
7m
Talk
Daedalux: An Extensible Platform for Variability-Aware Model Checking
Demonstrations
Sami Lazreg Visteon Electronics and Universite Cote d Azur, Maxime Cordy University of Luxembourg, Luxembourg, Simon Thrane Hansen SnT, University of Luxembourg, Axel Legay Université Catholique de Louvain, Belgium