TierTrain: Proactive Memory Tiering for CPU-Based DNN Training
Deep neural networks (DNNs) are one of the popular models for learning relationships between complex data. Training a DNN model is a compute- and memory-intensive operation. The size of modern DNN models spans into the terabyte region, requiring multiple accelerators to train – driving up the training cost. Such humongous memory requirements shift the focus toward memory rather than computation.
CPU-memory, on the other hand, can be scaled to several terabytes with new emerging memory technologies such as HBM and CXL-attached memories. Furthermore, recent advancements to the CPUs in terms of dedicated instructions for DNN training and inference are bridging the compute gap between CPUs and accelerators.
Proposed is an exploratory work in the direction of cost-effective DNN training on CPUs where we aim to alleviate memory management challenges in DNN training. We propose TierTrain, a novel memory tiering solution based on a dynamic queuing system that leverage the periodic and deterministic memory access behavior in DNN training to manage data placement across memory tiers. TierTrain proactively manages tensors by aggressively offloading them to slow memory tiers (NVMM, CXL) and timely prefetching them back to fast memory tiers (HBM, DRAM). Our evaluation of TierTrain on a tiered memory system with a real CXL-attached memory used for memory expansion and NVMM as a low cost memory results in average fast memory footprint reduction of 59–83% and peak fast memory footprint reduction of 25–74% with a performance overhead of 1–16%. In a memory-constrained scenario, TierTrain outperforms the state-of-the-art tiering by improving the performance by 35–84% for a set of popular DNN training models.
Tue 17 JunDisplayed time zone: Seoul change
15:40 - 17:05 | Session 4: 1540-1705 [Systems and Architecture]ISMM 2025 at Lilac Chair(s): Steve Blackburn Google and Australian National University | ||
15:40 20mTalk | Fully Randomized Pointers ISMM 2025 Sai Dhawal Phaye National University of Singapore, Gregory J. Duck National University of Singapore, Roland H. C. Yap National University of Singapore, Trevor E. Carlson National University of Singapore DOI | ||
16:00 20mTalk | TierTrain: Proactive Memory Tiering for CPU-Based DNN Training ISMM 2025 Sathvik Swaminathan Intel Labs, Sandeep Kumar Intel Labs, Aravinda Prasad Intel Labs, Sreenivas Subramoney Intel Labs DOI | ||
16:20 20mTalk | EMD: Fair and Efficient Dynamic Memory De-bloating of Transparent Huge PagesRecorded ISMM 2025 Parth Gangar Fujitsu Research of India, Ashish Panwar Microsoft Research India, K. Gopinath Rishihood University DOI | ||
16:40 20mTalk | Compiler-Assisted Crash Consistency for PMEMRecorded ISMM 2025 Yun Joon Soh University of California San Diego, Sihang Liu University of Waterloo, Steven Swanson University of California San Diego, Jishen Zhao University of California San Diego DOI | ||
17:00 5mDay closing | Closing remarks ISMM 2025 |