Apr 4, 2018 · Artificial Intelligence

Performance Optimization of Distributed TensorFlow for WDL Models at Meituan

Meituan‑Dianping identified data‑pipeline, network, and memory‑arena bottlenecks in distributed TensorFlow training of Wide & Deep recommendation models and resolved them by switching to tf.data pipelines, batching TFRecord reads, increasing MALLOC_ARENA_MAX, and moving embedding lookups to parameter servers, achieving 2–3× speedup and near‑linear scaling on up to 32 GPUs.

AFOPerformance OptimizationTensorFlow

0 likes · 12 min read

Performance Optimization of Distributed TensorFlow for WDL Models at Meituan