Operations 13 min read

Scaling Suning’s E‑Commerce for Double‑11: System Splitting and Resilience

Suning’s technical team shares how they prepared for Double‑11 by splitting monolithic services into focused modules, building a robust foundational platform with cloud, middleware, and monitoring tools, refining R&D processes, and implementing comprehensive load‑testing, optimization, and emergency response plans to ensure system stability under massive traffic.

21CTO
21CTO
21CTO
Scaling Suning’s E‑Commerce for Double‑11: System Splitting and Resilience

System Splitting

Suning’s online business comprises many high‑TPS systems that have been horizontally scaled through continuous splitting and refactoring. Previously, the main site ran on a large IBM‑based B2C platform with massive codebases, long startup times, and frequent downtime during promotions. To improve scalability and reduce risk, the team analyzed core business flows, separated backbone services from peripheral ones, and created independent modules for membership, product, inventory, pricing, cart, transaction, order, and content management, as well as dedicated systems for flash sales, coupons, and marketing.

These smaller, clearly defined services can be deployed independently, allowing faster response to new business requirements and easier onboarding for new engineers.

Basic Platform

After splitting, communication between services and duplication of functionality became challenges. Suning therefore established a foundational R&D center focusing on shared infrastructure such as self‑built CDN, cloud computing, cloud storage, messaging (MYESB), remote call framework (RSF), session sharing, distributed tracing, Redis sharding, database sharding, unified configuration, flow control, monitoring, CI/CD, and big‑data analysis platforms.

The platform also provides security‑related risk control systems. Key components include KVM and Docker middleware for deployment, object and file storage, and monitoring tools that collect real‑time logs (ELK), performance metrics (LogMonitor), and advanced tracing (CloudyTrace).

R&D Process

Suning’s rapid growth led to hundreds of services with varying quality. To ensure consistency, the organization introduced rigorous review checklists and meetings at critical stages: product review, architecture design templates, code inspections, automated quality analysis (Sonar), and a “code fly‑review” involving senior architects and directors to quickly assess core code.

Release checklists require sign‑off from product owners, developers, QA, DBA, operations, and online verification teams, reducing deployment issues.

System Assurance

For Double‑11, Suning focused on two aspects: increasing system load capacity and preparing emergency response plans. Traffic forecasts were broken down to per‑service PV, UV, and peak TPS targets. Systems not meeting these targets were optimized or scaled, prioritizing optimization over additional hardware.

Optimization involved top‑down health checks, architecture reviews, code tuning, and middleware adjustments. Specific improvements included moving user lists from database to Redis for a messaging system, and prioritizing push notifications for key cities.

Middleware tuning covered JVM settings, connection pools, cache strategies, and read/write separation. Load testing identified bottlenecks, sometimes performed on production during low‑traffic periods for realistic results.

Emergency plans consist of blacklists (IP/user), rate limiting (by IP, user, URL), and multi‑level degradation strategies such as disabling non‑critical features, static page rendering, captcha enforcement, extended cache lifetimes, and disabling backend compensation mechanisms. Regular drills are required to keep the plans reliable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringLoad Testingemergency responsee‑commerce architecturehigh-trafficsystem splitting
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.