Artificial Intelligence 10 min read

Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform

The iQIYI deep‑learning platform identified two TensorFlow Serving memory‑leak problems—a string‑accumulating executor map caused by unordered input maps and an uncontrolled gRPC thread surge under heavy load—submitted upstream patches that sort inputs and cap thread counts, eliminating OOM crashes and stabilizing production.

iQIYI Technical Product Team

Sep 24, 2021

Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform

TensorFlow Serving is a high‑performance inference system open‑sourced by Google and widely used in click‑through‑rate (CTR) prediction scenarios because of its stability and convenience. However, the iQIYI deep‑learning platform observed that the serving containers sometimes experience continuous memory growth, eventually leading to OOM (out‑of‑memory) crashes.

The article describes two distinct memory‑leak problems discovered in TensorFlow Serving, the root causes, and the patches submitted to the upstream projects.

Background

TensorFlow Serving supports both gRPC and HTTP interfaces, multi‑model and multi‑version deployments, and hot model updates. iQIYI also open‑sourced XGBoost Serving , which inherits these features.

In production, the service frequently receives OOM reports from the container runtime, and profiling with gperftools revealed that memory usage keeps increasing without bound.

Issue 1 – DirectSession::GetOrCreateExecutors Memory Leak

The profiling showed that the function DirectSession::GetOrCreateExecutors creates a large number of String objects via an unordered_map (named executors_) that maps model signatures to ExecutorsAndKeys. When the number of input features grows (e.g., 10! ≈ 3.6 million combinations), each unique combination generates a new entry, consuming hundreds of megabytes or even gigabytes of memory.

The underlying cause is that the PredictRequest sent to TensorFlow Serving contains a map<string, TensorProto> for inputs. The order of map entries is undefined in Protocol Buffers, and a recent client change caused the feature order to vary between requests. Because the order changed, the server could not find a matching entry in executors_ and kept inserting new strings, leading to the leak.

To fix the problem, two pull requests were submitted:

tensorflow/tensorflow#39743 – modifies the GetOrCreateExecutors implementation.

tensorflow/serving#1638 – sorts the inputs map inside TensorFlow Serving before processing, eliminating the need for repeated string concatenations.

Issue 2 – gRPC Thread Explosion Under High Concurrency

During traffic spikes, the serving containers also experienced OOM due to a rapid increase in the number of gRPC worker threads. Monitoring showed that the number of grpcpp_sync_ser threads grew dramatically, eventually exhausting container memory.

Investigation revealed that the gRPC server creates a new worker thread for each incoming request when the resource quota is not reached. Under heavy load, many threads remain alive simultaneously, consuming large amounts of memory.

The resolution was to add a maximum thread limit to the gRPC server. A pull request was submitted to TensorFlow Serving:

tensorflow/serving#1785 – introduces a resource‑quota‑based limit on the number of gRPC threads.

The authors recommend that any code using a gRPC server should enforce a maximum thread count to prevent service collapse during traffic bursts.

Conclusion

The two memory‑leak issues were fixed, and the TensorFlow Serving service has been stable in production since the patches were merged. The article also provides references to the relevant GitHub repositories and documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization grpc memory-leak TensorFlow Serving

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Issue 1 – DirectSession::GetOrCreateExecutors Memory Leak

Issue 2 – gRPC Thread Explosion Under High Concurrency

Conclusion

iQIYI Technical Product Team

How this landed with the community

Was this worth your time?

0 Comments

Issue 1 – DirectSession::GetOrCreateExecutors Memory Leak

Issue 2 – gRPC Thread Explosion Under High Concurrency