How GoldMiner Boosts Deep Learning Training by Up to 12× with Elastic Data Pre‑Processing
GoldMiner, a new system from Alibaba Cloud’s PAI platform, elastically scales deep learning data pre‑processing pipelines, dramatically improving training performance up to 12.1× and GPU cluster utilization by 2.5×, and its underlying research was accepted at SIGMOD 2023.
Recently, Alibaba Cloud’s Machine Learning Platform PAI, in collaboration with Professor Yang Zhi’s team at Peking University, had their paper “GoldMiner: Elastic Scaling of Training Data Pre‑Processing Pipelines for Deep Learning” accepted at SIGMOD 2023. The paper demonstrates that elastic scaling of deep‑learning data‑pre‑processing pipelines can greatly improve training performance and cluster resource utilization.
SIGMOD Context
SIGMOD is a premier international conference in the database and data‑management systems field, influencing both academia and industry since its inception in 1975. In recent years SIGMOD has emphasized cross‑disciplinary work, especially the intersection with machine learning and artificial intelligence, highlighting the significance of the GoldMiner contribution.
Problem Background
While GPU accelerators and software optimizations have continuously boosted deep‑learning computation efficiency, training remains a multi‑stage, multi‑resource workload. In addition to heavy GPU‑based training, a CPU‑side data‑pre‑processing pipeline (e.g., data augmentation, feature transformation) is essential for high‑quality models. As GPU training speeds increase, the pressure on data‑pre‑processing becomes a new performance bottleneck.
GoldMiner Solution
Observing that data‑pre‑processing pipelines are stateless and inherently elastic, GoldMiner separates the pipeline from model training. It automatically analyzes computation graphs to identify stateless preprocessing tasks and provides efficient parallel acceleration and elastic scaling, alleviating the preprocessing bottleneck and boosting overall training performance. By co‑designing with the cluster scheduler, GoldMiner further exploits the elasticity of preprocessing resources, dramatically improving scheduling efficiency. Experiments show GoldMiner can increase training performance by up to 12.1× and GPU‑cluster utilization by 2.5×.
Integration with PAI‑DLC
Alibaba Cloud’s PAI platform is integrating GoldMiner with PAI‑DLC to offer users accelerated data‑pre‑processing capabilities. PAI provides a cloud‑native, end‑to‑end machine‑learning solution covering interactive modeling (PAI‑DSW), visual modeling (PAI‑Designer), distributed training (PAI‑DLC), and online model deployment (PAI‑EAS). PAI‑DLC delivers a flexible, stable, and high‑performance training environment supporting multiple frameworks and massive distributed deep‑learning workloads.
Paper Details
Title: GoldMiner: Elastic Scaling of Training Data Pre‑Processing Pipelines for Deep Learning
Authors: Zhao Hanyu, Yang Zhi, Cheng Yu, Tian Chao, Ren Shirong, Xiao Wencong, Yuan Man, Chen Langshi, Liu Kaibo, Zhang Yang, Li Yong, Lin Wei
PDF: https://dl.acm.org/doi/pdf/10.1145/3589773
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
