Design and Implementation of Nginx Overload Protection Using Lua
This article describes the background, design concepts, selection rationale, implementation principles, algorithm details, and configuration of a Lua‑based overload protection module for Nginx that monitors system load and dynamically rejects traffic to safeguard backend services.
Background : Baidu Waimai’s business clusters are deployed on a standard framework where Nginx serves as the entry point for traffic, handling flow control and protecting backend services. Since vanilla Nginx lacks custom overload protection, a specialized solution was created.
Design Concept : Each product line monitors its own service status and self‑protects based on traffic conditions. Resource load thresholds are configurable; when exceeded, Nginx rejects traffic. The design emphasizes minimal configuration, reduced manual intervention, system self‑healing, and broad applicability.
Selection : Nginx is chosen for its high performance, low memory footprint, and stability. The ngx_lua module integrates a Lua interpreter, allowing business logic to be expressed in lightweight, coroutine‑enabled scripts, reducing implementation cost while maintaining high concurrency.
Design Principle : Nginx reads real‑time load metrics (CPU load, memory usage, disk I/O) via the Lua module, compares them against user‑defined thresholds, and, if thresholds are crossed, applies a configurable rejection percentage to incoming requests.
Implementation Principle : When Nginx worker processes start, Lua allocates memory within Nginx, initializes configuration, and determines whether overload protection is enabled. A periodic timer ( ngx.timer.at ) gathers load metrics, evaluates them against thresholds, and updates an in‑memory flag and rejection percentage.
Algorithm Details : Because traffic is not perfectly linear, a random‑number approach is used. After the overload flag and rejection rate are set, each request generates a random number; if it falls below the rejection percentage, the request is dropped. Tests with 1,000 requests showed an error rate of about 2%.
Random‑Reject Code : The actual Lua code for random rejection is illustrated in the original article (image).
Parameter Configuration :
lua_shared_dict devicedb 5m;
overload_flag on|off # default: off
overload_condition_cpu # default: nil
overload_condition_mem # default: nil
overload_condition_io # default: nil
overload_reject_percent # default: 10%
These directives configure the shared memory for Lua, enable or disable overload protection, set threshold conditions for CPU, memory, and I/O, and define the percentage of traffic to reject when overload is detected.
Summary and Planning : The current implementation only evaluates local resource thresholds. Future work will extend overload protection to upstream module states, such as backend health, error rates, or timeouts, enabling product‑level overload safeguards.
Baidu Waimai Technology Team
The Baidu Waimai Technology Team supports and drives the company's business growth. This account provides a platform for engineers to communicate, share, and learn. Follow us for team updates, top technical articles, and internal/external open courses.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.