Meituan Technology Team
Apr 4, 2018 · Artificial Intelligence
Performance Optimization of Distributed TensorFlow for WDL Models at Meituan
Meituan‑Dianping identified data‑pipeline, network, and memory‑arena bottlenecks in distributed TensorFlow training of Wide & Deep recommendation models and resolved them by switching to tf.data pipelines, batching TFRecord reads, increasing MALLOC_ARENA_MAX, and moving embedding lookups to parameter servers, achieving 2–3× speedup and near‑linear scaling on up to 32 GPUs.
AFODistributed TrainingPerformance Optimization
0 likes · 12 min read
