How JD.com Scaled Its Product Detail Pages to Billions of Views
This article details JD.com's evolution of product detail page architecture—from early IIS/C# monoliths to a distributed, cache‑heavy, Nginx+Lua system—covering front‑end dimensions, performance metrics, design principles, scaling challenges, storage engine choices, multi‑datacenter deployment, and the lessons learned from numerous production pitfalls.
What Is a Product Detail Page
A product detail page displays comprehensive product information and serves as a major traffic and order entry point for JD.com. Multiple templates (general, global purchase, flash sale, automotive, clothing, group buy, etc.) share the same metadata but differ in presentation. Because of diverse personalization needs and data sources, a new architecture was designed, consisting of three parts: the static product detail system, a unified dynamic service, and a dynamic service for internal data provision.
Front‑End Structure
The front‑end is divided into dimensions such as product (title, images, attributes), main product (description, specs), category, merchant, shop, and high‑frequency data loaded asynchronously (price, promotion, delivery, pre‑sale, etc.).
Performance Data
During the 618 shopping festival, page views reached billions and server response time stayed below 38 ms (99th percentile of the 1000th request).
Single‑Product Page Traffic Characteristics
Data is sparse, hotspots are few, and the page is heavily crawled by bots and price‑comparison tools.
Evolution of Technical Architecture
Architecture 1.0
IIS + C# + SQL Server, later adding a memcached layer for caching. This setup suffered from performance jitter due to unstable dependent services.
Architecture 2.0
Introduced static HTML generation per product dimension. Workflow: 1) MQ notifies changes; 2) Java workers generate HTML; 3) rsync distributes files; 4) Nginx serves static pages; 5) load‑balancing at the access layer. Drawbacks included full re‑generation for category changes, rsync bottlenecks, and slow response to page‑level updates.
Architecture 3.0
Goals: rapid response to volatile requirements, support vertical page redesigns, modular pages, A/B testing, high performance, horizontal scalability, multi‑datacenter active‑active deployment. Main ideas: 1) MQ for data change notifications; 2) Heterogeneous workers store raw atomic data in JIMDB (Redis + persistent engine); 3) Synchronization workers aggregate data into JSON per dimension (basic info, product intro, other info); 4) Front‑end uses Nginx + Lua to fetch data and render templates; 5) A dynamic service layer provides key‑value data for any non‑relational use case.
Detail Page Architecture Design Principles
Data closed‑loop
Data dimensionalization
System decomposition
Stateless, task‑oriented workers
Asynchrony and concurrency
Multi‑level caching
Dynamic data acquisition
Elasticity
Degradation switches
Multi‑datacenter active‑active
Multiple load‑testing strategies
Data Closed‑Loop
All data is managed within the system without external dependencies. Data heterogenization imports external data as atomic records; data atomicization enables flexible re‑processing; data aggregation builds a single JSON for front‑end consumption; JIMDB stores data with Redis‑style sharding and persistence, supporting both key‑value and relational queries.
Data Dimensionalization
Data is split into: 1) Basic product info (title, attributes, images, specs); 2) Product intro info (merchant templates, descriptions); 3) Non‑product info (category, merchant, shop, brand); 4) Asynchronously loaded data (price, promotion, delivery, recommendations).
System Decomposition
The architecture is divided into sub‑systems: data heterogenization, data synchronization, front‑end detail page, and product intro services, reducing inter‑dependency and allowing independent scaling.
Stateless Workers + Task Queues
Workers are stateless for horizontal scaling, with configuration per datacenter. Multiple queues handle waiting, deduplication, local execution, and failures, prioritized by normal, data‑refresh, and high‑priority (e.g., flash‑sale) queues. Replay queues enable post‑deployment data correction.
Multi‑Level Caching
Browser cache with Last‑Modified validation, CDN edge caching, Nginx + Lua shared‑dict local cache, and distributed cache (memory + SSD + JIMDB). Consistent hashing improves hit rates; mget is optimized by local cache first, then remote cache for misses.
Dynamic Rendering
Data is fetched per dimension at runtime, allowing template changes without redeployment. Nginx + Lua enables second‑level restarts in seconds and rapid feature rollout.
Elasticity
All services run in Docker containers with base images for quick provisioning; automatic scaling based on CPU or bandwidth is supported.
Degradation Switches
Centralized switch management pushes degradation flags to servers; multi‑level read services allow fallback from front‑end cache to data heterogenization to dynamic services.
Multi‑Datacenter Active‑Active
Stateless applications read from datacenter‑specific clusters; a primary‑three‑replica setup ensures resilience when a site fails.
Load‑Testing Strategies
Offline testing with Apache ab and JMeter (single‑URL stress).
Online testing using Tcpcopy to replay real traffic, optionally amplified, or using Nginx + Lua coroutines for distributed load generation.
Encountered Issues and Solutions
SSD Performance Degradation
Consumer‑grade SSDs (e.g., Samsung 840 Pro) showed unstable throughput under RAID configurations. Testing revealed RAID mode and controller age as factors; proper RAID and SSD selection are critical.
Key‑Value Store Selection Benchmark
Benchmarked LevelDB, RocksDB, and LMDB with 1.7 billion records (5‑30 KB each). LMDB exhibited stable performance with minimal jitter, leading to its adoption; RocksDB performed well for pure reads/writes but jittered under mixed workloads.
JIMDB Synchronization Bottleneck
Data dump consumed >50 % of SSD capacity, causing sync stalls. Solutions included increasing SSD count, using SAS disks for dump, and planning in‑memory forwarding to avoid dumps.
Master‑Slave Switch Overload
Transitioning from a 1‑master‑2‑slave to 1‑master‑3‑slave topology alleviated load spikes during failover.
Shard Configuration Complexity
Introduced Twemproxy to centralize shard logic and automated deployment pipelines to push config changes with MQ pause for consistency.
Template Metadata Storage
Initially stored full HTML fragments in JIMDB, causing massive rewrites on template changes. Shifted to storing only metadata; Lua renders templates at runtime, reducing TP99 from 53 ms to 32 ms.
High Inventory API Traffic
During a 2014 attack, inventory API saw >6 million requests per minute. Enabling Nginx proxy cache reduced load to normal levels.
WeChat API Surge
Implemented rate limiting and KV‑based throttling to protect backend services.
Nginx Proxy Cache Performance Drop
Cache on HDD caused memory pressure; switching to SSD or tmpfs and tuning kernel parameters restored performance.
Delivery Service Latency
Parallelized dependent service calls and pre‑fetched data, cutting TP99 from ~1 s to <500 ms.
Network Jitter 502 Errors
Reduced Twemproxy timeout settings (connection, read, write) to under 150 ms, improving reliability.
Excessive Machine Traffic
Moved GZIP compression from the access layer to individual services and adjusted compression levels, decreasing upstream traffic fivefold and CPU usage to ~4 %.
Summary
Data closed‑loop, dimensionalized storage, system decomposition, stateless task‑oriented workers, asynchronous concurrency, multi‑level caching, dynamic rendering, elasticity, degradation switches, multi‑datacenter active‑active deployment, diverse load‑testing, intelligent Nginx access handling, minimal request headers, cookie‑free domains, selective proxy caching, non‑blocking locks, Twemproxy for connection reduction, Unix domain sockets, proper timeout settings, long‑lived connections, and service‑oriented architecture collectively enable a highly performant, scalable product detail page platform.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
