APLAS 2025
Mon 27 - Thu 30 October 2025 Bengaluru, India

This program is tentative and subject to change.

As High-Performance Computing (HPC) workloads increasingly migrate to cloud infrastructures, the need for intelligent, real-time scheduling becomes critical. Conventional schedulers such as FCFS or SJF often struggle to adapt to the heterogeneous and dynamic nature of cloud-based systems, leading to inefficient resource utilization and increased job wait times. This paper proposes a unified artificial intelligence (AI)-based framework to address these challenges through the integration of three core capabilities: job runtime prediction using supervised learning, anomaly detection via deep autoencoders, and adaptive resource scheduling using reinforcement learning. Leveraging real-world data from the MIT SuperCloud dataset, containing over 2TB of CPU and GPU performance traces, our system extracts meaningful patterns from time-series telemetry to support informed scheduling decisions. The job prediction module enables the estimation of runtimes based on CPU utilization, memory consumption, and I/O statistics. The anomaly detection module flags resource-wasting or abnormal jobs using learned GPU performance norms. The reinforcement learning scheduler dynamically matches jobs to compute nodes based on predicted duration and anomaly status, optimizing for turnaround time and utilization. Experimental evaluations demonstrate a 28% reduction in average turnaround time and over 10% increase in resource utilization compared to traditional schedulers. These results establish the viability of AI-driven orchestration strategies in HPC cloud platforms and underscore the importance of integrated learning-based systems in achieving scalable, efficient, and context-aware workload management.

This program is tentative and subject to change.

Wed 29 Oct

Displayed time zone: Chennai, Kolkata, Mumbai, New Delhi change

14:00 - 15:00
AI and Compiler Optimisation for PerformanceResearch Papers at APLAS room
Chair(s): Meenakshi D'Souza IIITB - International Institute of Information Technology Bangalore
14:00
30m
Talk
ELTC: An End-to-End Large Language Model-Based Tensor Compilation Optimization Framework
Research Papers
wenbo ma Tiangong University, qingzeng song Tiangong University, yongjiang xue Tiangong University, Fei Qiao Tsinghua University, mingze sun Tiangong University
14:30
30m
Talk
Performance Optimization of HPC Workloads in Cloud Using AI-Driven Algorithms
Research Papers
Aman Iftekhar IIT Patna, Rahul Mishra IIT Patna