Dynamic Resource Allocation for Deadline-Constrained Neural Network Training
FULL
This program is tentative and subject to change.
Neural Networks (NNs) serve as the backbone for various applications, including computer vision, speech recognition, and natural language processing. Due to their iterative nature, training NNs is a highly compute-intensive task that is typically executed using a statically allocated set of devices (e.g., CPUs or GPUs). This static allocation prevents adjusting priorities, making it impossible to reassign resources to urgent tasks and potentially causing high-priority training jobs to miss their expected completition times.
This paper proposes DECOR-NN (DEadline COnstrained Resource allocation for Neural Networks), a control mechanism for NN training that dynamically allocates resources according to a user-defined deadline (i.e., a Service Level Agreement), ensuring the training phase completes within the specified time. The solution leverages control theory and has been developed on top of PyTorch, a widely-used framework for training NNs. DECOR-NN dynamically allocates either GPUs or fractions of CPUs to meet user deadlines and also allows users to modify the deadline at runtime to accommodate changes in job priorities. A comprehensive empirical evaluation using three benchmark applications demonstrates that DECOR-NN successfully completes training jobs with an average deviation from the deadline of only 1.75%.
This program is tentative and subject to change.
Mon 28 AprDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | |||
14:00 25mTalk | Dynamic Resource Allocation for Deadline-Constrained Neural Network TrainingFULL Research Track Luciano Baresi Politecnico di Milano, Marco Garlini Politecnico di Milano, Giovanni Quattrocchi Politecnico di Milano Pre-print | ||
14:25 25mTalk | Integrating Performance Prediction, Anomaly Prediction and Root-Cause Localization for Self-Healing Software SystemsFULL Research Track | ||
14:50 25mTalk | WasteLess: An Optimal Provisioner for Self-Adaptive Second-Generation Serverless ApplicationsFULL Research Track Emilio Incerto IMT School for Advanced Studies Lucca, Roberto Pizziol IMT School for Advanced Studies Lucca, Gabriele Russo Russo University of Rome Tor Vergata, Italy, Mirco Tribastone IMT Institute for Advanced Studies Lucca, Italy | ||
15:15 15mOther | Discussion Session 3 Research Track |