Tag

Model Hot Update

0 views collected around this technical thread.

iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 27, 2020 · Artificial Intelligence

Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems

By adding model warm‑up files, separating load/unload threads, switching to the Jemalloc allocator, and isolating TensorFlow’s parameter memory from RPC request buffers, iQIYI’s engineers reduced TensorFlow Serving hot‑update latency spikes in high‑throughput CTR recommendation services from over 120 ms to about 2 ms, eliminating jitter.

AI inferenceModel Hot UpdateTensorFlow Serving
0 likes · 11 min read
Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems