Tagged articles
4 articles
Page 1 of 1
Tencent Cloud Developer
Tencent Cloud Developer
Sep 1, 2021 · Artificial Intelligence

Why Distributed Machine Learning Accelerates AI Training at Scale

This article reviews how distributed machine learning tackles massive data and compute challenges by partitioning models and data across workers, optimizing communication with primitives, parameter servers, and Ring AllReduce, reducing IO overhead, and applying advanced optimizers such as LARS and LAMB to achieve faster, scalable training.

LAMB optimizerLARS optimizerParameter Server
0 likes · 31 min read
Why Distributed Machine Learning Accelerates AI Training at Scale
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 12, 2019 · Artificial Intelligence

How Alibaba’s PAISoar Accelerates Deep Learning: 101× Speedup on 128 GPUs

Alibaba engineers detail the PAISoar distributed training framework, showing how RDMA‑optimized hardware, Ring AllReduce algorithms, and user‑friendly APIs boost deep‑learning models—like the GreenNet CNN—to 101‑fold speedups on 128 GPUs, dramatically reducing training time from days to under a day.

AI InfrastructureDeep LearningDistributed Training
0 likes · 17 min read
How Alibaba’s PAISoar Accelerates Deep Learning: 101× Speedup on 128 GPUs
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 5, 2017 · Artificial Intelligence

Alibaba’s Distributed Training Boosts Neural Machine Translation Speed

Since its 2013 debut, Neural Machine Translation (NMT) has approached human quality, but training costs are high; Alibaba’s team developed a distributed NMT system in 2017, employing data‑parallel, model‑average, BMUF, Downpour SGD, and Ring‑allReduce techniques to cut training time from over 20 days to a few days while maintaining translation quality.

BMUFDistributed TrainingDownpour SGD
0 likes · 18 min read
Alibaba’s Distributed Training Boosts Neural Machine Translation Speed