iQIYI Technical Product Team
Nov 27, 2020 · Artificial Intelligence
Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems
By adding model warm‑up files, separating load/unload threads, switching to the Jemalloc allocator, and isolating TensorFlow’s parameter memory from RPC request buffers, iQIYI’s engineers reduced TensorFlow Serving hot‑update latency spikes in high‑throughput CTR recommendation services from over 120 ms to about 2 ms, eliminating jitter.
AI inferenceModel Hot UpdateTensorFlow Serving
0 likes · 11 min read