How GoldMiner Boosts Deep Learning Training by Up to 12× with Elastic Data Pre‑Processing

GoldMiner, a new system from Alibaba Cloud’s PAI platform, elastically scales deep learning data pre‑processing pipelines, dramatically improving training performance up to 12.1× and GPU cluster utilization by 2.5×, and its underlying research was accepted at SIGMOD 2023.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How GoldMiner Boosts Deep Learning Training by Up to 12× with Elastic Data Pre‑Processing
GoldMiner illustration
GoldMiner illustration

Recently, Alibaba Cloud’s Machine Learning Platform PAI, in collaboration with Professor Yang Zhi’s team at Peking University, had their paper “GoldMiner: Elastic Scaling of Training Data Pre‑Processing Pipelines for Deep Learning” accepted at SIGMOD 2023. The paper demonstrates that elastic scaling of deep‑learning data‑pre‑processing pipelines can greatly improve training performance and cluster resource utilization.

SIGMOD Context

SIGMOD is a premier international conference in the database and data‑management systems field, influencing both academia and industry since its inception in 1975. In recent years SIGMOD has emphasized cross‑disciplinary work, especially the intersection with machine learning and artificial intelligence, highlighting the significance of the GoldMiner contribution.

Problem Background

While GPU accelerators and software optimizations have continuously boosted deep‑learning computation efficiency, training remains a multi‑stage, multi‑resource workload. In addition to heavy GPU‑based training, a CPU‑side data‑pre‑processing pipeline (e.g., data augmentation, feature transformation) is essential for high‑quality models. As GPU training speeds increase, the pressure on data‑pre‑processing becomes a new performance bottleneck.

GoldMiner Solution

Observing that data‑pre‑processing pipelines are stateless and inherently elastic, GoldMiner separates the pipeline from model training. It automatically analyzes computation graphs to identify stateless preprocessing tasks and provides efficient parallel acceleration and elastic scaling, alleviating the preprocessing bottleneck and boosting overall training performance. By co‑designing with the cluster scheduler, GoldMiner further exploits the elasticity of preprocessing resources, dramatically improving scheduling efficiency. Experiments show GoldMiner can increase training performance by up to 12.1× and GPU‑cluster utilization by 2.5×.

Integration with PAI‑DLC

Alibaba Cloud’s PAI platform is integrating GoldMiner with PAI‑DLC to offer users accelerated data‑pre‑processing capabilities. PAI provides a cloud‑native, end‑to‑end machine‑learning solution covering interactive modeling (PAI‑DSW), visual modeling (PAI‑Designer), distributed training (PAI‑DLC), and online model deployment (PAI‑EAS). PAI‑DLC delivers a flexible, stable, and high‑performance training environment supporting multiple frameworks and massive distributed deep‑learning workloads.

Paper Details

Title: GoldMiner: Elastic Scaling of Training Data Pre‑Processing Pipelines for Deep Learning

Authors: Zhao Hanyu, Yang Zhi, Cheng Yu, Tian Chao, Ren Shirong, Xiao Wencong, Yuan Man, Chen Langshi, Liu Kaibo, Zhang Yang, Li Yong, Lin Wei

PDF: https://dl.acm.org/doi/pdf/10.1145/3589773

SIGMOD conference logo
SIGMOD conference logo
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Deep Learningdata preprocessingGPU utilizationSIGMOD
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.