How JD.com Guarantees Database Performance During Billion‑Scale Sales Events
This article details JD.com’s comprehensive strategies—including architecture design, pre‑event preparation, real‑time safeguards, and post‑event analysis—to ensure MySQL databases remain high‑performance and highly available during massive traffic spikes like 618 and Double‑11 sales.
Architecture and Load
JD.com’s application architecture follows a mainstream design: user requests are accelerated by CDN, then routed to an application cluster, followed by a message queue (JMQ), a cache cluster (JIMDB and Redis), and finally the database cluster. Over 95% of production databases are MySQL, with a small portion of Oracle, SQL Server, and MongoDB for specific systems.
The database proxy layer (Jproxy) handles traffic management, connection pooling, and read/write separation. JD.com still uses a traditional MySQL master‑slave architecture because of its proven stability and superior performance compared to distributed solutions. The production environment runs more than 75% of MySQL instances in Docker containers.
During major sales events, network traffic can increase 2‑3×, while MySQL QPS can surge up to tenfold, stressing the database layer dramatically.
Pre‑Promotion Preparation
Key steps before a sales event include:
Communication : Hold joint meetings with development teams to confirm critical systems, identify weak points, and define emergency and downgrade plans.
SQL Optimization : Identify and rewrite slow queries; JD.com’s platform collects slow SQL via an enhanced Box‑anemometer tool and notifies owners via email.
Capacity Expansion : Use an automated expansion platform to add master/slave instances, leveraging Docker for rapid deployment.
Data Archiving : Implement a data‑migration pipeline that moves stale data to low‑cost storage (Tokudb) and archives historical data.
Stress Testing : Conduct full‑link and isolated system load tests, simulate order flows with CDN robots, and verify failover mechanisms.
During the Promotion – Real‑Time Safeguards
Operations run 24/7 with strict change‑control: any new deployment requires VP‑level approval and dual‑person DBA verification. Critical DBAs and developers co‑locate to accelerate issue resolution.
The automated operations platform manages asset inventory, Docker containers, MySQL cluster provisioning, automatic MHA failover, DNS updates, backup/restore, and self‑service deployment, dramatically reducing manual DBA workload.
Post‑Promotion Review and Continuous Improvement
After each event, JD.com conducts a thorough retrospective, documenting successes and failures. The focus shifts to enhancing visualization, automation, and intelligent platforms, aligning DBA work with SRE principles and software‑engineer skill sets.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
