DaTaobao Tech
Sep 7, 2022 · Artificial Intelligence
Online Deep Learning (ODL) Model Optimization for Real‑Time Recommendation
The team enhanced real‑time recommendation by redesigning TensorFlow graphs—using constant‑folding, a custom CallGraphOP cache, a simplified dense layer, and CUDA‑Graph compatibility—boosting single‑machine throughput ~40%, raising GPU utilization from 30% to 43%, cutting latency and saving roughly 30% of hardware resources.
CUDA GraphGPU performanceOnline Deep Learning
0 likes · 11 min read