Backend Development 11 min read

Analysis and Solutions for Load‑Balancing Issues in QLB‑4 Based TFServing Service Calls

The investigation of QLB‑4‑based TFServing calls revealed uneven traffic, stale routing after scaling, and idle servers due to layer‑4 hash routing, leading the team to replace QLB‑4 with a Consul‑driven client‑side load‑balancer that dynamically pools servers, eliminates restarts, and cuts GPU waste.

iQIYI Technical Product Team

Nov 26, 2021

Analysis and Solutions for Load‑Balancing Issues in QLB‑4 Based TFServing Service Calls

Load balancing aims to distribute network requests or other workloads evenly across multiple machines, preventing some servers from being overloaded while others remain idle. It can be implemented in software (e.g., Nginx) or hardware.

In iQIYI's content‑understanding platform, TFServing services are accessed via gRPC and deployed on the QAE platform, using QLB‑4 (a fourth‑layer TCP load balancer) as the load‑balancing solution. During large‑scale online inference, three main problems were observed:

QLB‑4 does not achieve true load balancing; traffic is unevenly distributed, causing some servers to receive significantly more requests.

When new service instances appear (including after a restart), existing clients continue to send traffic to the old instances until the client is restarted.

If the number of clients is smaller than the number of server instances, some servers never receive any requests.

Experiments were conducted by deploying both gRPC clients and servers on QAE and measuring inbound network traffic on each server. The results (illustrated in the original figures) confirmed the three issues: (a) a 2:1 traffic ratio between two servers, (b) a newly added server C received no traffic, and (c) when servers outnumber clients, server C remained idle.

Root‑cause analysis revealed that QLB‑4 operates at OSI layer 4 using a connection‑hash routing table. The table records the chosen backend for each client connection, so subsequent packets are always forwarded to the same server. This design leads to:

Static server assignment for each client (Problem 1).

Stale routing entries when server topology changes, requiring client restart to rebuild the table (Problem 2).

Fixed one‑to‑one mapping that wastes server resources when there are more servers than clients (Problem 3).

Two solution paths were proposed:

Solution 1 – Optimizing QLB‑4

Improve the hash routing table by clearing it whenever the server pool changes or by adding an expiration time to entries. Alternatively, remove the hash table entirely and perform real‑time scheduling, though this may reduce throughput under high concurrency.

Solution 2 – Client‑Side Load Balancing

Implement client‑side load balancing using the company’s service registry (Consul) and the Skywalker micro‑service framework. The client periodically fetches the list of healthy TFServing instances from Consul, builds a channel pool, and selects a server for each RPC using a chosen algorithm (e.g., round‑robin). This approach eliminates the extra hop introduced by QLB‑4, reduces latency, and automatically adapts to server changes without requiring client restarts.

After evaluating both options, the team adopted the client‑side load‑balancing solution. The final deployment stabilized TFServing calls, eliminated uneven traffic distribution, removed the need for client restarts during server scaling, and reduced GPU resource waste, thereby lowering operational costs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend load balancing gRPC Consul QLB-4 TFServing

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Solution 1 – Optimizing QLB‑4

Solution 2 – Client‑Side Load Balancing

iQIYI Technical Product Team

How this landed with the community

Was this worth your time?

0 Comments

Solution 1 – Optimizing QLB‑4

Solution 2 – Client‑Side Load Balancing