Operations 12 min read

How WeChat Keeps Billions of Requests Stable: Overload Control Strategies for Massive Microservices

This article breaks down WeChat’s 2018 overload control system for massive microservices, explaining the problem of service overload, detection via average waiting time, and a multi‑level priority‑based mitigation strategy that dynamically adjusts admission thresholds to keep billions of daily requests stable.

ITPUB
ITPUB
ITPUB
How WeChat Keeps Billions of Requests Stable: Overload Control Strategies for Massive Microservices

Background

WeChat serves over one billion monthly active users. Traffic spikes during holidays can cause service overload, yet the platform remains stable. The 2018 SOCC paper “Overload Control for Scaling WeChat Microservices” describes the overload‑protection mechanisms employed.

Definition of Overload

Overload occurs when the request volume exceeds a service’s processing capacity, leading to high server load, increased queueing delay, and latency visible to users. Retries amplify the problem and can trigger cascade failures across the call chain.

Detection Metric

WeChat measures the average waiting time of requests in the RPC queue—the interval from request arrival to the start of processing. A batch of 2 000 requests is sampled; if the average waiting time exceeds 20 ms , the service is considered overloaded. This metric is service‑agnostic, unlike response time or CPU usage, which can be misleading in a chained microservice environment.

Control Loop

When overload is detected, the acceptance rate is reduced by a small factor a≈0.05 (≈5 %). When the waiting‑time metric falls below the threshold, the acceptance rate is increased slowly by factor b≈0.01 (≈1 %). This negative‑feedback loop quickly damps traffic spikes while avoiding oscillations.

Priority Model

Each request receives two independent priority dimensions:

Business priority – a predefined ranking of business types (login > payment > messaging > timeline). The ranking is stored in a hash table; services not listed receive the lowest priority.

User priority – derived from a hash of the user’s unique ID, refreshed hourly to prevent priority gaming. The hash yields a value in the range 0‑127 (128 levels).

The combined pair (B, U) defines an admission plane. A request is admitted if its business priority > B, or if business priority = B and user priority > U.

Adaptive Admission Adjustment

Each service maintains a histogram of request counts per (B, U) over the last interval (1 s or 2 000 requests). When overload is detected, the service raises the admission threshold until the projected request volume is reduced by a factor a≈0.05. Recovery raises the threshold more gently using factor b≈0.01. This histogram‑based search replaces costly linear or binary scans across thousands of priority levels.

Upstream‑Downstream Coordination

When an upstream service calls a downstream service, the downstream returns its current admission priority. The upstream compares this value with the request’s (B, U); if the request would be rejected downstream, the upstream discards it locally, saving bandwidth and CPU.

Load‑Control Flow

Client request reaches the access layer; a unified (B, U) priority is generated and attached to all subsequent sub‑requests.

Each downstream service checks its local admission threshold; if the request fails the check it is dropped, otherwise it is processed.

Services periodically adjust their thresholds based on the histogram and the 20 ms waiting‑time metric.

Before forwarding a request, an upstream service consults the recorded admission priority of the downstream; if the request’s priority is lower, it is discarded locally.

The downstream includes its current admission priority in the response, allowing the upstream to update its record.

Key Characteristics

Business‑agnostic metric : average queue waiting time is independent of specific service logic.

Two‑dimensional priority : separates business importance from per‑user fairness.

Fast adaptive control : histogram‑based adjustment provides sub‑second reaction without exhaustive search.

Coordinated dropping : upstreams can pre‑emptively discard requests that would be rejected downstream, reducing unnecessary traffic.

Reference: https://www.cs.columbia.edu/~ruigu/papers/socc18-final100.pdf

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Microservicesservice reliabilityWeChatoverload controlPriority Schedulingbackend operations
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.