Backend Development 33 min read

Design and Evolution of JD.com Product Detail Page Architecture for High Performance and Scalability

This article details JD.com's transition from static to dynamic, high‑performance product detail page architecture, describing the multi‑layer system design, data heterogeneity, caching strategies, scaling techniques, operational challenges, and the solutions implemented to meet rapid business demands during massive traffic events.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Design and Evolution of JD.com Product Detail Page Architecture for High Performance and Scalability

After the JD.com 618 event, the need for a high‑performance, real‑time rendering solution for product detail pages became evident as static approaches could no longer handle the growing complexity and variability of business requirements.

What is a Product Detail Page

The product detail page displays comprehensive product information and serves as a major traffic and order entry point. JD.com maintains many template variants (general, global purchase, flash sale, automotive, fashion, group‑buy, etc.) that share core data but differ in front‑end logic.

Personalized requirements and numerous data sources (dozens of backend services) demand an architecture that can respond to changes within 5‑10 minutes, handling urgent requests such as regulatory complaints.

Architecture Overview

Product Detail Page System Handles the static portion of the entire page.

Dynamic Service System and Unified Service System 统一服务系统 provides real‑time data such as inventory. Several core services are already online, with plans to roll out new inventory services to a fraction of traffic. 动态服务系统 offers data to internal systems (e.g., large‑customer services) and has been stable for six months, primarily serving list pages.

Key‑Value Heterogeneous Data Cluster To avoid performance issues caused by heavy relational joins, key‑value stores are used for fast queries, while relational data is handled by a separate heterogenous system.

Historical Evolution

Architecture 1.0

IIS + C# + SQL Server with a memcached layer for caching; suffered from instability due to dependent services.

Architecture 2.0

Static generation of HTML per product dimension using MQ, Java workers, rsync, and Nginx. Main drawbacks: full page regeneration for minor changes, rsync bottlenecks, and slow response to page‑level updates.

Architecture 2.1

Routing by product suffix to multiple machines, generating HTML fragments per dimension, and merging via Nginx SSI. Issues included excessive fragment files, SSD performance under high concurrency, and long full‑page regeneration times.

Architecture 3.0

Fully dynamic rendering with three subsystems: data change notification via MQ, heterogeneous workers storing atomic data in JIMDB (Redis‑based KV with persistence), aggregation workers building dimension‑based JSON, and front‑end rendering using Nginx + Lua.

Design Principles

Data closed‑loop

Data dimensionization

System decomposition

Stateless, task‑oriented workers

Asynchrony and concurrency

Multi‑level caching

Dynamic rendering

Elastic scaling

Degradation switches

Multi‑datacenter active‑active

Comprehensive load testing

Key Techniques

Data Heterogeneity : Store raw atomic data (basic info, extensions, specs, categories, merchants) in KV stores; aggregate into larger JSON for front‑end consumption.

Data Storage : Use JIMDB (Redis + LMDB) with hash‑tag sharding and Twemproxy for reduced connection overhead; combine memory, SSD, and persistent storage.

Dimension‑Based Storage : Separate product basic info, product introduction, non‑product info, and asynchronously loaded data (price, promotion, delivery, etc.).

System Splitting : Separate heterogeneous data system, aggregation system, and front‑end display system to isolate impacts.

Stateless Workers : Horizontal scaling of data‑heterogeneous and aggregation workers; task queues with priority levels for urgent items (e.g., flash‑sale items).

Asynchrony & Concurrency : Message‑driven decoupling, concurrent data fetching, request merging, and parallel service calls to reduce latency.

Multi‑Level Caching : Browser cache, CDN cache, Nginx+Lua shared dict, and local proxy cache to minimize backend load.

Dynamic Rendering : Nginx+Lua renders templates on demand, allowing rapid template changes and AB testing.

Elasticity : Docker‑based deployments with base images; automatic scaling based on CPU or bandwidth.

Degradation Switches : Centralized feature flags pushed to servers; fallback paths from front‑end cache to heterogeneous cache to dynamic services.

Multi‑Datacenter Active‑Active : Stateless applications with per‑datacenter configuration; one‑master‑three‑slave Redis topology for resilience.

Operational Challenges & Solutions

SSD performance variability – switched to enterprise‑grade Intel SSDs.

KV store selection – benchmarked LevelDB, RocksDB, LMDB; chose LMDB for stable mixed read/write performance.

JIMDB sync bottlenecks – increased SSD count, used dedicated SAS disks for dump, and plan in‑memory transfer.

Master‑slave switch latency – moved from RDB dump to direct memory replication, expanded to one‑master‑three‑slave.

Shard configuration complexity – introduced Twemproxy for centralized sharding and automated deployment.

Template metadata storage – stored only metadata in JIMDB, performed rendering logic in Lua, reducing storage size.

High inventory request volume – applied Nginx proxy cache and short‑term caching to mitigate 600 W/min spikes.

WeChat interface surge – rate‑limited KV reads and backend services.

Nginx proxy cache performance drop – tuned kernel parameters, moved cache to memory/tmpfs, and used SSDs.

Network‑induced 502 errors – reduced Twemproxy timeouts to <150 ms and added graceful degradation.

Network bandwidth overload – shifted compression to application layer, adjusted GZIP levels, achieving ~5× traffic reduction.

Summary

Data closed‑loop

Data dimensionization

System decomposition

Stateless, task‑oriented workers

Asynchrony & concurrency

Multi‑level caching

Dynamic rendering

Elastic scaling

Degradation switches

Multi‑datacenter active‑active

Comprehensive load testing

Nginx‑based gray release and header filtering

Stateless domains without cookies

Selective Nginx proxy caching

Non‑blocking locks for cache stampede protection

Twemproxy to reduce Redis connections

Unix domain sockets to cut TCP overhead

Reasonable connection/read/write timeouts

Long‑lived connections to reduce backend load

Service‑oriented migration away from direct DB dependencies

Domain‑based client connection limits

Q&A

Various questions about dependency volatility, static‑JS shielding, internal service exposure, MQ error handling, template compilation, JIMDB characteristics, price/stock caching, and storage engine choices are answered, illustrating practical operational insights.

backenddistributed systemsarchitecturescalabilitycachinghigh performancedynamic rendering
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.