Dynamic Resource Allocation for Deadline-Constrained Neural Network Training (SEAMS 2025 - Research Track)

Who

Luciano Baresi, Marco Garlini, Giovanni Quattrocchi

Track

SEAMS 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 14:00 - 14:25 at 204 - Session 3: Resource Allocation Chair(s): Matteo Camilli

Abstract

Neural Networks (NNs) serve as the backbone for various applications, including computer vision, speech recognition, and natural language processing. Due to their iterative nature, training NNs is a highly compute-intensive task that is typically executed using a statically allocated set of devices (e.g., CPUs or GPUs). This static allocation prevents adjusting priorities, making it impossible to reassign resources to urgent tasks and potentially causing high-priority training jobs to miss their expected completition times.

This paper proposes DECOR-NN (DEadline COnstrained Resource allocation for Neural Networks), a control mechanism for NN training that dynamically allocates resources according to a user-defined deadline (i.e., a Service Level Agreement), ensuring the training phase completes within the specified time. The solution leverages control theory and has been developed on top of PyTorch, a widely-used framework for training NNs. DECOR-NN dynamically allocates either GPUs or fractions of CPUs to meet user deadlines and also allows users to modify the deadline at runtime to accommodate changes in job priorities. A comprehensive empirical evaluation using three benchmark applications demonstrates that DECOR-NN successfully completes training jobs with an average deviation from the deadline of only 1.75%.

Link to Preprint

https://re.public.polimi.it/retrieve/9c22bd5e-187c-49c4-8e46-1104d152361d/SEAMS_25___Dynamic_Resource_Allocation_for_Deadline_Constrained__Neural_Network_Training-11.pdf

Luciano Baresi

Politecnico di Milano

Italy

Marco Garlini

Politecnico di Milano

Giovanni Quattrocchi

Politecnico di Milano

Italy

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Session 3: Resource AllocationResearch Track at 204 Chair(s): Matteo Camilli Politecnico di Milano

14:00 25m Talk		Dynamic Resource Allocation for Deadline-Constrained Neural Network TrainingFULL Research Track Luciano Baresi Politecnico di Milano, Marco Garlini Politecnico di Milano, Giovanni Quattrocchi Politecnico di Milano Pre-print
14:25 25m Talk		Integrating Performance Prediction, Anomaly Prediction and Root-Cause Localization for Self-Healing Software SystemsFULL Research Track Hamza Hussain York University, Ghadeer Abuoda York University, Marin Litoiu York University, Canada
14:50 25m Talk		WasteLess: An Optimal Provisioner for Self-Adaptive Second-Generation Serverless ApplicationsFULL Research Track Emilio Incerto IMT School for Advanced Studies Lucca, Roberto Pizziol IMT School for Advanced Studies Lucca, Gabriele Russo Russo University of Rome Tor Vergata, Italy, Mirco Tribastone IMT Institute for Advanced Studies Lucca, Italy
15:15 15m Other		Discussion Session 3 Research Track