How JD.com Scaled Its Product Detail Pages to Billions of Views

This article details JD.com's evolution of product detail page architecture—from early IIS/C# monoliths to a distributed, cache‑heavy, Nginx+Lua system—covering front‑end dimensions, performance metrics, design principles, scaling challenges, storage engine choices, multi‑datacenter deployment, and the lessons learned from numerous production pitfalls.

21CTO
21CTO
21CTO
How JD.com Scaled Its Product Detail Pages to Billions of Views

What Is a Product Detail Page

A product detail page displays comprehensive product information and serves as a major traffic and order entry point for JD.com. Multiple templates (general, global purchase, flash sale, automotive, clothing, group buy, etc.) share the same metadata but differ in presentation. Because of diverse personalization needs and data sources, a new architecture was designed, consisting of three parts: the static product detail system, a unified dynamic service, and a dynamic service for internal data provision.

Front‑End Structure

The front‑end is divided into dimensions such as product (title, images, attributes), main product (description, specs), category, merchant, shop, and high‑frequency data loaded asynchronously (price, promotion, delivery, pre‑sale, etc.).

Performance Data

During the 618 shopping festival, page views reached billions and server response time stayed below 38 ms (99th percentile of the 1000th request).

Single‑Product Page Traffic Characteristics

Data is sparse, hotspots are few, and the page is heavily crawled by bots and price‑comparison tools.

Evolution of Technical Architecture

Architecture 1.0

IIS + C# + SQL Server, later adding a memcached layer for caching. This setup suffered from performance jitter due to unstable dependent services.

Architecture 2.0

Introduced static HTML generation per product dimension. Workflow: 1) MQ notifies changes; 2) Java workers generate HTML; 3) rsync distributes files; 4) Nginx serves static pages; 5) load‑balancing at the access layer. Drawbacks included full re‑generation for category changes, rsync bottlenecks, and slow response to page‑level updates.

Architecture 3.0

Goals: rapid response to volatile requirements, support vertical page redesigns, modular pages, A/B testing, high performance, horizontal scalability, multi‑datacenter active‑active deployment. Main ideas: 1) MQ for data change notifications; 2) Heterogeneous workers store raw atomic data in JIMDB (Redis + persistent engine); 3) Synchronization workers aggregate data into JSON per dimension (basic info, product intro, other info); 4) Front‑end uses Nginx + Lua to fetch data and render templates; 5) A dynamic service layer provides key‑value data for any non‑relational use case.

Detail Page Architecture Design Principles

Data closed‑loop

Data dimensionalization

System decomposition

Stateless, task‑oriented workers

Asynchrony and concurrency

Multi‑level caching

Dynamic data acquisition

Elasticity

Degradation switches

Multi‑datacenter active‑active

Multiple load‑testing strategies

Data Closed‑Loop

All data is managed within the system without external dependencies. Data heterogenization imports external data as atomic records; data atomicization enables flexible re‑processing; data aggregation builds a single JSON for front‑end consumption; JIMDB stores data with Redis‑style sharding and persistence, supporting both key‑value and relational queries.

Data Dimensionalization

Data is split into: 1) Basic product info (title, attributes, images, specs); 2) Product intro info (merchant templates, descriptions); 3) Non‑product info (category, merchant, shop, brand); 4) Asynchronously loaded data (price, promotion, delivery, recommendations).

System Decomposition

The architecture is divided into sub‑systems: data heterogenization, data synchronization, front‑end detail page, and product intro services, reducing inter‑dependency and allowing independent scaling.

Stateless Workers + Task Queues

Workers are stateless for horizontal scaling, with configuration per datacenter. Multiple queues handle waiting, deduplication, local execution, and failures, prioritized by normal, data‑refresh, and high‑priority (e.g., flash‑sale) queues. Replay queues enable post‑deployment data correction.

Multi‑Level Caching

Browser cache with Last‑Modified validation, CDN edge caching, Nginx + Lua shared‑dict local cache, and distributed cache (memory + SSD + JIMDB). Consistent hashing improves hit rates; mget is optimized by local cache first, then remote cache for misses.

Dynamic Rendering

Data is fetched per dimension at runtime, allowing template changes without redeployment. Nginx + Lua enables second‑level restarts in seconds and rapid feature rollout.

Elasticity

All services run in Docker containers with base images for quick provisioning; automatic scaling based on CPU or bandwidth is supported.

Degradation Switches

Centralized switch management pushes degradation flags to servers; multi‑level read services allow fallback from front‑end cache to data heterogenization to dynamic services.

Multi‑Datacenter Active‑Active

Stateless applications read from datacenter‑specific clusters; a primary‑three‑replica setup ensures resilience when a site fails.

Load‑Testing Strategies

Offline testing with Apache ab and JMeter (single‑URL stress).

Online testing using Tcpcopy to replay real traffic, optionally amplified, or using Nginx + Lua coroutines for distributed load generation.

Encountered Issues and Solutions

SSD Performance Degradation

Consumer‑grade SSDs (e.g., Samsung 840 Pro) showed unstable throughput under RAID configurations. Testing revealed RAID mode and controller age as factors; proper RAID and SSD selection are critical.

Key‑Value Store Selection Benchmark

Benchmarked LevelDB, RocksDB, and LMDB with 1.7 billion records (5‑30 KB each). LMDB exhibited stable performance with minimal jitter, leading to its adoption; RocksDB performed well for pure reads/writes but jittered under mixed workloads.

JIMDB Synchronization Bottleneck

Data dump consumed >50 % of SSD capacity, causing sync stalls. Solutions included increasing SSD count, using SAS disks for dump, and planning in‑memory forwarding to avoid dumps.

Master‑Slave Switch Overload

Transitioning from a 1‑master‑2‑slave to 1‑master‑3‑slave topology alleviated load spikes during failover.

Shard Configuration Complexity

Introduced Twemproxy to centralize shard logic and automated deployment pipelines to push config changes with MQ pause for consistency.

Template Metadata Storage

Initially stored full HTML fragments in JIMDB, causing massive rewrites on template changes. Shifted to storing only metadata; Lua renders templates at runtime, reducing TP99 from 53 ms to 32 ms.

High Inventory API Traffic

During a 2014 attack, inventory API saw >6 million requests per minute. Enabling Nginx proxy cache reduced load to normal levels.

WeChat API Surge

Implemented rate limiting and KV‑based throttling to protect backend services.

Nginx Proxy Cache Performance Drop

Cache on HDD caused memory pressure; switching to SSD or tmpfs and tuning kernel parameters restored performance.

Delivery Service Latency

Parallelized dependent service calls and pre‑fetched data, cutting TP99 from ~1 s to <500 ms.

Network Jitter 502 Errors

Reduced Twemproxy timeout settings (connection, read, write) to under 150 ms, improving reliability.

Excessive Machine Traffic

Moved GZIP compression from the access layer to individual services and adjusted compression levels, decreasing upstream traffic fivefold and CPU usage to ~4 %.

Summary

Data closed‑loop, dimensionalized storage, system decomposition, stateless task‑oriented workers, asynchronous concurrency, multi‑level caching, dynamic rendering, elasticity, degradation switches, multi‑datacenter active‑active deployment, diverse load‑testing, intelligent Nginx access handling, minimal request headers, cookie‑free domains, selective proxy caching, non‑blocking locks, Twemproxy for connection reduction, Unix domain sockets, proper timeout settings, long‑lived connections, and service‑oriented architecture collectively enable a highly performant, scalable product detail page platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsNGINXLuaProduct Detail Page
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.