Backend Development 34 min read

Scaling JD.com’s Product Detail Pages with Dynamic, High‑Performance Architecture

This article details the evolution and redesign of JD.com’s product detail page architecture, describing the transition from static HTML generation to a dynamic, high‑performance, multi‑datacenter system built on key‑value storage, Nginx + Lua, asynchronous processing, multi‑level caching, and robust scaling and reliability strategies.

21CTO

Aug 31, 2015

Scaling JD.com’s Product Detail Pages with Dynamic, High‑Performance Architecture

In this talk, the author shares the redesign of JD.com’s product detail page (PDP) architecture, driven by the need for high‑performance real‑time rendering and rapid response to complex, ever‑changing business requirements.

What Is a Product Detail Page

A PDP displays detailed product information and serves as a major traffic and order entry point. JD.com maintains many PDP templates (general, global purchase, flash sale, automotive, clothing, group buying, etc.) that share the same core logic but differ in front‑end behavior.

Personalized demands and numerous data sources (dozens of backend services) require an architecture that can handle urgent changes within minutes, something static HTML generation cannot provide.

Architecture Overview

The system consists of three main parts:

Product Detail Page System – responsible for the static portion of the whole page.

Dynamic Service System and Unified Service System – the Unified Service System handles real‑time data such as inventory; the Dynamic Service System provides data services to other internal systems.

Key‑Value Heterogeneous Data Cluster – stores atomic key‑value data to avoid costly relational joins.

Data is stored dimensionally (product, merchant, shop, etc.) and frequently cached in Redis for fast access.

History of PDP Architecture

Architecture 1.0

Technology stack: IIS + C# + SQL Server. Direct DB queries caused performance spikes; a memcached layer was added later.

Architecture 2.0 (Static Generation)

Static HTML is generated via a workflow: MQ notifies changes → Java workers fetch data from dependencies → generate HTML → rsync to other machines → Nginx serves the static files.

Main drawbacks:

Any category or breadcrumb change forces a full re‑generation of all related products.

Rsync becomes a bottleneck as product count grows.

Frequent page‑level changes cannot be responded to quickly.

Architecture 2.1 (Fragmented Static Generation)

Products are routed by their suffix to multiple machines; HTML fragments (header, specs, breadcrumbs, etc.) are generated separately and assembled via Nginx SSI.

Main drawbacks:

Too many fragment files, causing inode exhaustion.

Mechanical disks perform poorly with SSI under high concurrency.

Template changes still require massive re‑generation.

When capacity is reached, static pages are removed and dynamic rendering is used, which stresses downstream services.

Architecture 3.0 (Fully Dynamic)

Key pain points addressed:

Static capacity limits.

Inability to react to rapid, complex business changes.

The new design keeps the same data‑centric ideas but moves to real‑time rendering:

Data changes are still notified via MQ.

Data heterogeneity workers write raw atomic data to a JIMDB cluster (Redis + persistent engine).

A synchronization worker aggregates data by dimension (basic info, product intro, other info) into separate JIMDB clusters.

Front‑end rendering uses Nginx + Lua to fetch data and render templates on the fly.

Principles guiding the new system include data closed‑loop, dimensional storage, stateless workers, asynchronous and concurrent processing, multi‑level caching, dynamic rendering, elasticity, and graceful degradation.

Key Design Principles

Data closed‑loop – keep all data within the system, avoiding external dependencies.

Data dimensionalization – store data by product, merchant, shop, etc., enabling efficient retrieval.

System decomposition – split responsibilities across heterogeneous data, synchronization, and front‑end services.

Stateless, task‑oriented workers – horizontal scalability.

Asynchronous + concurrent processing – use message queues and parallel calls to reduce latency.

Multi‑level caching – browser cache, CDN, Nginx shared dict, local and remote Redis clusters.

Dynamic rendering – templates rendered at request time, supporting rapid UI changes.

Elastic scaling – Docker containers and auto‑scaling based on CPU or bandwidth.

Degrade‑switches – centralized feature flags to gracefully fallback under pressure.

Multi‑datacenter active‑active deployment – each datacenter reads its own replica, with failover to other zones.

Comprehensive load testing – offline (ab, JMeter) and online (tcpcopy, traffic replay).

Problems Encountered and Solutions

SSD Performance Issues

Consumer‑grade SSDs (Samsung 840 Pro) showed unstable throughput; switched to enterprise‑grade Intel 3500 drives.

Key‑Value Store Selection

Benchmarked LevelDB, RocksDB, BeansDB, LMDB, Riak; LMDB offered stable performance for mixed read/write workloads.

JIMDB Synchronization Bottlenecks

Large data volumes caused dump‑and‑sync failures; solution: increase SSD count per machine, use dedicated SAS disks for sync, and plan direct memory forwarding.

Master‑Slave Switch Overhead

Original one‑master‑two‑slave setup caused latency spikes during failover; upgraded to one‑master‑three‑slave for smoother transitions.

Shard Configuration Complexity

Introduced Twemproxy to centralize shard logic and automated deployment to reduce manual changes.

Template Metadata Storage

Moved from storing full HTML fragments to storing only metadata; Lua renders templates using this metadata, reducing storage size.

High Inventory Request Volume

During a flash‑sale, inventory API saw >6 million requests per minute; enabled Nginx proxy cache to throttle and cache responses, stabilizing the system.

Network Jitter and 502 Errors

Reduced Twemproxy timeout settings (connection, read, write) to 150 ms and added fallback to dynamic services.

Excessive Traffic on Access Layer

Moved GZIP compression from the access layer to individual services, cutting upstream traffic by ~80% and lowering CPU usage.

Summary

Data closed‑loop

Dimensional data storage

System decomposition

Stateless, task‑oriented workers

Asynchronous + concurrent processing

Multi‑level caching

Dynamic rendering

Elastic scaling

Graceful degradation switches

Active‑active multi‑datacenter deployment

Robust load‑testing strategies

Optimized access‑layer handling (header trimming, stateless domains, selective proxy caching)

Connection pooling and non‑blocking locks for cache stampede protection

Twemproxy for Redis connection reduction

Unix domain sockets to lower TCP overhead

Reasonable timeout configurations

Long‑connection reuse

Service‑oriented design to eliminate direct DB dependencies

Domain‑based client connection partitioning

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce backend-architecture High Availability Caching Lua Key-Value Store

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.