Artificial Intelligence 10 min read

Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform

The iQIYI deep‑learning platform identified two TensorFlow Serving memory‑leak problems—a string‑accumulating executor map caused by unordered input maps and an uncontrolled gRPC thread surge under heavy load—submitted upstream patches that sort inputs and cap thread counts, eliminating OOM crashes and stabilizing production.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform

TensorFlow Serving is a high‑performance inference system open‑sourced by Google and widely used in click‑through‑rate (CTR) prediction scenarios because of its stability and convenience. However, the iQIYI deep‑learning platform observed that the serving containers sometimes experience continuous memory growth, eventually leading to OOM (out‑of‑memory) crashes.

The article describes two distinct memory‑leak problems discovered in TensorFlow Serving, the root causes, and the patches submitted to the upstream projects.

Background

TensorFlow Serving supports both gRPC and HTTP interfaces, multi‑model and multi‑version deployments, and hot model updates. iQIYI also open‑sourced XGBoost Serving , which inherits these features.

In production, the service frequently receives OOM reports from the container runtime, and profiling with gperftools revealed that memory usage keeps increasing without bound.

Issue 1 – DirectSession::GetOrCreateExecutors Memory Leak

The profiling showed that the function DirectSession::GetOrCreateExecutors creates a large number of String objects via an unordered_map (named executors_ ) that maps model signatures to ExecutorsAndKeys . When the number of input features grows (e.g., 10! ≈ 3.6 million combinations), each unique combination generates a new entry, consuming hundreds of megabytes or even gigabytes of memory.

The underlying cause is that the PredictRequest sent to TensorFlow Serving contains a map<string, TensorProto> for inputs . The order of map entries is undefined in Protocol Buffers, and a recent client change caused the feature order to vary between requests. Because the order changed, the server could not find a matching entry in executors_ and kept inserting new strings, leading to the leak.

To fix the problem, two pull requests were submitted:

tensorflow/tensorflow#39743 – modifies the GetOrCreateExecutors implementation.

tensorflow/serving#1638 – sorts the inputs map inside TensorFlow Serving before processing, eliminating the need for repeated string concatenations.

Issue 2 – gRPC Thread Explosion Under High Concurrency

During traffic spikes, the serving containers also experienced OOM due to a rapid increase in the number of gRPC worker threads. Monitoring showed that the number of grpcpp_sync_ser threads grew dramatically, eventually exhausting container memory.

Investigation revealed that the gRPC server creates a new worker thread for each incoming request when the resource quota is not reached. Under heavy load, many threads remain alive simultaneously, consuming large amounts of memory.

The resolution was to add a maximum thread limit to the gRPC server. A pull request was submitted to TensorFlow Serving:

tensorflow/serving#1785 – introduces a resource‑quota‑based limit on the number of gRPC threads.

The authors recommend that any code using a gRPC server should enforce a maximum thread count to prevent service collapse during traffic bursts.

Conclusion

The two memory‑leak issues were fixed, and the TensorFlow Serving service has been stable in production since the patches were merged. The article also provides references to the relevant GitHub repositories and documentation.

performance optimizationgRPCMemory LeakAI infrastructuremodel servingTensorFlow Serving
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.