nvshare: Practical GPU Sharing without Memory Size Constraints (ICSE 2024 - Demonstrations)

Who

Georgios Alexopoulos, Dimitris Mitropoulos

Track

ICSE 2024 Demonstrations

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Apr 2024 15:15 - 15:22 at Maria Helena Vieira da Silva - Dependability and Formal methods 1 Chair(s): Domenico Bianculli

Abstract

GPUs are essential for accelerating Machine Learning (ML) workloads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each colocated job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e.,exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in convential sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5sc5AICY

Link to Preprint

https://dimitro.gr/assets/papers/AM23.pdf

Georgios Alexopoulos

University of Athens

Greece

Dimitris Mitropoulos