Tagged articles
3 articles
Page 1 of 1
Baidu Geek Talk
Baidu Geek Talk
Oct 31, 2022 · Artificial Intelligence

PaddleBox: A GPU‑Based Ultra‑Large‑Scale Sparse DNN Training Framework

PaddleBox is Baidu’s GPU‑based ultra‑large‑scale sparse DNN training framework that combines a three‑tier hierarchical parameter server (SSD, DRAM, HBM) with pipelined scheduling and multi‑machine multi‑GPU communication, delivering 5–40× cost‑performance gains over traditional CPU solutions and powering Baidu’s advertising services.

Deep LearningGPUPaddleBox
0 likes · 15 min read
PaddleBox: A GPU‑Based Ultra‑Large‑Scale Sparse DNN Training Framework
DataFunTalk
DataFunTalk
Dec 23, 2021 · Artificial Intelligence

Deep Customization and Optimization of TensorFlow for Large-Scale Sparse Training at Meituan

This article details Meituan's internal, heavily customized TensorFlow 1.x implementation that addresses large‑scale sparse parameter support, distributed training challenges, communication bottlenecks, and pipeline optimizations, achieving over ten‑fold scalability improvements and significant per‑node performance gains in recommendation system workloads.

Distributed TrainingSparse ParametersTensorFlow
0 likes · 32 min read
Deep Customization and Optimization of TensorFlow for Large-Scale Sparse Training at Meituan
Meituan Technology Team
Meituan Technology Team
Dec 9, 2021 · Artificial Intelligence

Deep Customization of TensorFlow for Large-Scale Sparse Training at Meituan

Meituan heavily customized TensorFlow 1.x for large‑scale sparse training, replacing variable embeddings with hash tables, improving load balancing, using RDMA communication, pipeline‑embedding graphs, high‑performance hash tables, and operator merges, achieving over ten‑fold scalability, up to 51% operator speedups, and enabling billions‑parameter models on CPU clusters with future GPU expansion.

Distributed TrainingRecommendation SystemsSparse Parameters
0 likes · 31 min read
Deep Customization of TensorFlow for Large-Scale Sparse Training at Meituan