How China’s Top E‑commerce Giants Engineer Their Backend for Double‑11 Traffic Surges

This article examines how leading Chinese e‑commerce platforms such as Dangdang, Suning, Mogujie, and Vipshop redesign their backend systems, adopt decoupling, caching, scaling, and monitoring techniques to handle the massive traffic spikes of the Double‑11 shopping festival.

21CTO
21CTO
21CTO
How China’s Top E‑commerce Giants Engineer Their Backend for Double‑11 Traffic Surges

From the data released by various e‑commerce companies, the Double‑11 shopping festival this year broke previous records, prompting a massive technical battle behind the scenes.

Dangdang – Two Major Initiatives

Initiative 1: Rebuilding the Promotion System

Purpose: support fine‑grained, multi‑channel promotion throughout the entire purchase flow. Goal: allow time‑based rules, stacked activities, and rapid adjustment based on execution results. Core improvements focus on data‑model alignment, system decoupling, and stronger data processing.

Solution steps:

Define a basic promotion model.

Abstract an activity model on top of it.

Decouple by replacing direct DB reads and redundant storage with service calls and MQ listeners.

Extract promotion logic into a transaction service, separating validation and calculation from cart and order services.

Expose promotion attributes (type enum, tags, purchase limits, inventory policies) via a plugin‑based architecture.

Enhance the promotion query service with Redis caching and MQ‑driven cache invalidation.

Note: Redis reduces DB pressure but does not alleviate CPU load for compute‑intensive tasks, and event‑driven cache clearing can cause stale data if not carefully synchronized.

Initiative 2: Rebuilding the Transaction System

Centralized configuration with hot‑load capability.

Page‑level caching of large computation results.

Merge multiple settlement versions into a shared core logic, handling special cases (flash sale, one‑click order, re‑shipment) separately.

Gray release and seamless switch using Nginx reload; each Nginx balances two app servers, and during gray release only one server receives traffic.

Online parallel comparison of new and old transaction logic to ensure correctness.

User‑ID based traffic splitting with whitelist pre‑validation before gradual rollout.

On‑demand web‑server scaling by adding server IPs to Nginx configuration and reloading.

Suning – Four Strategic Directions

Direction 1: System Splitting

Analyze the main flow, separate core and peripheral systems, and deploy them independently. Core subsystems include member, product, inventory, pricing, cart, transaction, order, and CMS; marketing systems such as flash‑sale and coupon platforms are also isolated.

Direction 2: Basic Platform

Build self‑managed CDN, cloud compute, and storage; provide common services (SMS, email, verification); integrate middleware for session sharing, distributed tracing, Redis sharding, DB sharding, config management, and flow control; operate monitoring, CI, big‑data analytics, and risk‑control platforms.

Direction 3: R&D Process

Adopt code‑fly‑review (periodic rapid core‑code reviews by senior architects) and a release checklist involving product, development, QA, DBA, ops, and online verification to ensure quality before launch.

Direction 4: System Assurance

Preparation 1: Increase Load Capacity – forecast PV/UV/TPS per system, assess current capacity, conduct top‑down health checks, and tune architecture, critical code, and middleware.

Preparation 2: Emergency Plans – implement blacklists, rate‑limiting (IP, user, URL), and degradation strategies (disable non‑critical features, static pages, captchas, extended cache TTL, bypass backend compensation) based on pressure levels.

Mogujie – Five Focus Areas

Focus 1: Stability

Resolve kernel/device‑mapper issues (disable dm‑thin discard, prevent disk over‑allocation) to avoid random crashes and read‑only file‑system states.

Focus 2: Multi‑Dimensional Threshold Monitoring

Real‑time PID count monitoring (no container‑level pid_max limit).

Adjusted memory usage calculation (treat 40% of cache as RSS).

Log ordering fixes by configuring rsyslog on host only.

Dynamic CPU/memory/I/O throttling and health scans for proactive risk detection.

Focus 3: Disaster Recovery & Emergency Handling

Offline Docker data recovery via temporary dmsetup device; support cold migration of containers through a one‑click management UI.

Focus 4: Integration with Existing Ops Systems

Unified container management platform enables provisioning of Docker containers across clusters within 7 seconds.

Focus 5: Performance Optimization

Kernel I/O tuning (vm.dirty_expire_centisecs, vm.dirty_writeback_centisecs, vm.extra_free_kbytes).

Deploy Facebook’s flashcache to use SSD as cache for Docker I/O.

Reduce Docker image layers to speed up pulls.

Vipshop – Six Design Points

Point 1: Effective Module Separation

Physically split business sub‑systems to reduce coupling, allowing independent deployment and rapid isolation of failures.

Point 2: Service‑Oriented Decoupling & Centralized Governance

Venus framework (built on Spring) provides database access abstraction, Redis/Memcached wrappers, CRUD code generation, OSP/REST call support, configuration center, and documentation hub.

Point 3: Asynchronous Access

Offload non‑real‑time operations to async processing; use distributed message queues with retry mechanisms for reliability.

Point 4: Multi‑Stage Caching

Static asset CDN, distributed cache cluster, and cautious use of local server cache to avoid memory waste.

Point 5: Database Access Optimization

Optimize complex queries and reduce query count.

Read‑write separation (master‑slave) to alleviate master load.

Cache‑first strategy with safeguards against cache‑penetration.

Selective data redundancy for critical modules.

Employ NoSQL for massive data storage.

Point 6: Strengthened System Monitoring

Collect infrastructure metrics (CPU, memory, disk, network, TCP connections) and business metrics (PV, UV, conversion, cart, order, payment). Use Mercury, an in‑house APM, to instrument code, databases, and caches for real‑time performance insight.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

e‑commercePerformance OptimizationBackend Architecturehigh concurrencysystem scaling
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.