JEN: JD Extended Nginx Platform for Scalable Management and Automation
The article introduces JEN, JD's extended Nginx platform that centralizes configuration, monitoring, traffic splitting, rate limiting and automated operations through a web console and Ansible integration, addressing the complexity, restart requirements, and scaling challenges of large‑scale Nginx deployments.
Nginx is an excellent HTTP and reverse‑proxy server widely used across JD departments, but it commonly suffers from complex configuration, the need to restart for changes, fragmented management of modules and settings, and difficulty scaling a single application’s Nginx instances.
Configuration is complex and requires expertise.
Configuration files cannot be batch‑modified and changes depend on restarts.
Different applications rely on different modules and settings, leading to chaotic management.
Scaling a single application’s Nginx quickly and in bulk is not possible.
These problems stem from Nginx being a single‑node system; in large‑scale, high‑speed internet environments like JD, such issues are amplified. To address this, JD designed and developed JEN (JD EXTENDED NGINX), which now powers most core services in JD Finance such as Duobao Bar, Card Supermarket, and Baitiao.
Overall Architecture
Figure 1: JEN architecture diagram.
Operations staff use a web console to perform configuration tasks. For traffic‑splitting or rate‑limiting rules, the information is stored in a database and Nginx instances pull the rules via a Restful API. For operations such as smooth upgrades or restarts, the console triggers Ansible to execute the required actions on the Nginx nodes.
Figure 2: Multi‑data‑center deployment of Nginx and the web console.
JEN characteristics:
Supports automatic Nginx discovery, group management, and status monitoring.
Provides a unified entry point with abstracted configuration, simplifying lifecycle control of Nginx clusters and enabling batch rule configuration and execution.
Extends native Nginx traffic‑splitting and rate‑limiting capabilities, allowing real‑time in‑memory rule synchronization without modifying configuration files or restarting processes.
1. Basic Information
All web‑based displays and operations are based on aggregated basic information, which includes two main categories:
Group information (business line, application, data center, Nginx IP).
Nginx attributes such as upstream definitions, server_name, listen_port, etc., collected from Nginx.conf via heartbeat reporting.
Group information can be populated in two ways:
Importing complete data via an external service’s Restful API.
Manually editing groups for automatically discovered Nginx instances.
Figure 3: Relationship diagram of groups.
The four‑layer hierarchy (business line → application → data center → Nginx) enables batch operations such as bulk configuration modification, mass upgrades, and restarts, greatly improving productivity.
2. Rule Retrieval
After a user configures rules in the web console, each Nginx instance asynchronously fetches its relevant rules from the server. Rules are stored in memory for immediate effect, with each process holding its own copy to avoid lock contention. A version‑number design guarantees strict ordering, preventing inconsistencies caused by packet loss or latency, and eliminates unnecessary CPU usage when rules are unchanged.
3. Security
JEN defines three role types, each with distinct permissions (default is a read‑only user). All operations are logged, and the web UI provides multi‑dimensional audit log queries for compliance and troubleshooting.
4. Monitoring
Comprehensive monitoring data is collected and displayed, including:
a) Extension of the tengine active probing module to capture average and current latency of upstream servers.
b) Heartbeat‑based Nginx liveliness monitoring.
c) TCP connection statistics, inbound/outbound traffic, QPS, and response code distribution (1xx‑5xx).
These metrics can be aggregated by group (business line, application, data center) and visualized on large‑screen dashboards for real‑time operational insight.
2. Traffic Splitting
Concept: Based on request characteristics (IP, arbitrary header keywords), specific requests can be directed to one or more upstream servers.
Figure 4: Traffic‑splitting example.
This feature is useful for gray‑release, A/B testing, and other scenarios. JEN also adds one‑click enable/disable of upstream servers via the web console, facilitating maintenance or upgrades without disrupting user traffic.
3. Rate Limiting
During large promotions (e.g., JD 618), traffic spikes can cause sudden QPS surges that overwhelm a few machines, leading to cascading failures. Effective rate limiting requires selecting appropriate algorithms (leaky bucket, token bucket) and tuning parameters based on historical traffic, service capacity, and marketing intensity.
JEN implements rate limiting using shared memory to synchronize state across Nginx processes. Initial design stored rules per‑process, which caused inconsistencies and excessive memory consumption. The improved approach pre‑allocates shared memory, synchronizes rules in real time, and maintains a rule chain that only removes old versions after traffic has transitioned.
Figure 5: Rule chain.
Rate‑limiting extensions include:
Custom error pages: besides static Nginx pages, JEN supports 302 redirects or, using Nginx sub‑request mechanisms, returns custom content while keeping the original URL unchanged, improving user experience during throttling. Figure 6: Comparison of two error‑page approaches.
Extended algorithms that block an offending IP for a configurable cooldown period after it triggers rate limiting.
Integrated blacklist/whitelist functionality to prevent false positives, especially in NAT environments.
4. Operational Features
Operational features cover Nginx installation, upgrades, configuration changes, start/stop actions, and more. Because these actions often require restarts, integrating Ansible with the web console provides a lightweight, low‑migration‑cost automation solution compared to heavier tools like Puppet.
In production, the web UI and Ansible are deployed as a clustered service; configuration data is stored in a database rather than local Ansible files, enabling easy scaling of the management suite.
Figure 7: Automated operation workflow.
Through the web console, users trigger Ansible to perform upgrades, restarts, and other tasks. The console captures Ansible’s standard output, formats it, and streams it to the UI for real‑time progress visibility. Output is enriched and structured to improve readability.
Reliability mechanisms include three‑layer error checking (form validation, pre‑execution Nginx‑t test, post‑execution port and process verification) and gray‑scale execution (single‑node sequential rollout with immediate rollback on failure, or percentage‑based batch rollout per data center).
5. Summary
The article outlines JD’s practical experience in platformizing Nginx with JEN, providing a unified entry point for the entire Nginx lifecycle, supporting batch rule modifications with instant effect, and delivering comprehensive monitoring, security, and automated operational capabilities.
Author Introduction
Wu Jianmiao , currently working at JD Finance Hangzhou R&D Center, responsible for Nginx and MQ projects, with a strong interest in high‑performance server development and tuning. Feel free to contact him via WeChat (wujm1230).
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.