Improving Load Balancing for a Compute‑Intensive Ticket Query Engine with a Pooling Strategy
The article analyzes why a round‑robin load‑balancing approach caused severe response‑time spikes in Ctrip's compute‑intensive international ticket query engine and demonstrates how switching to a proactive pooling model using a Redis‑backed queue eliminated the spikes and reduced average latency by about 20%.
Background
In a compute‑intensive service, each request can consume all CPU cores. When two requests arrive on the same server, they compete for CPU, causing longer average processing times. Traditional load‑balancing methods such as round‑robin or random ignore server load, leading to multiple concurrent requests on a single machine and degraded service quality.
After a recent refactor, Ctrip's international ticket query engine became fully compute‑intensive with a maximum concurrency of one, yet the load‑balancer remained round‑robin. Monitoring showed persistent response‑time spikes caused by a small number of long‑running “A‑type” requests (several seconds) that blocked dozens of short‑running “B‑type” requests (tens of milliseconds), creating severe latency spikes.
Pooling Solution
To address the issue, a new load‑balancing strategy called pooling was introduced. Instead of passively receiving requests, servers actively pull requests from a global queue, ensuring that each server processes at most one request at a time.
The pooling architecture consists of three roles:
submitor : receives external calls, enqueues requests to the queue, and forwards worker results back to the caller.
queue : a globally unique Redis list used as a buffer; lpush adds requests, brpop blocks workers until a request is available.
worker : continuously loops to brpop a request, processes it, and returns the result to the submitor.
This design guarantees that a server is either processing a request or waiting for one, eliminating the contention that caused spikes.
Effect
After switching from round‑robin to pooling, the average response time dropped by roughly 20% and the long‑tail spikes disappeared completely.
Round‑robin mode:
Pooling mode:
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.