Databases 17 min read

How JD.com Scaled MySQL with Docker: From Early Trials to 70% Production Deployment

This article recounts JD.com's journey of Dockerizing MySQL, covering the evolution of its container platform, reasons for container adoption, preparation steps, encountered challenges with large‑scale clusters, and the solutions that enabled over 70% of its MySQL instances to run reliably in Docker containers.

dbaplus Community
dbaplus Community
dbaplus Community
How JD.com Scaled MySQL with Docker: From Early Trials to 70% Production Deployment

JD.com Docker Technology Evolution

JD.com began building a virtualization platform in 2013 using OpenStack + KVM. Performance (TP99 > 40 ms) was insufficient for core services. In September 2014, senior architect Liu Haifeng introduced Docker. After minimal development, TP99 fell to around 40 ms, prompting a shift to containerization.

Leveraging OpenStack expertise, JD.com created the first‑generation container engine JDOS 1.0 (JD DataCenter OS), which combined OpenStack orchestration with Docker runtime. By the 2015‑2016 “618” promotion, the platform achieved 100 % containerization of application workloads and scaled to roughly 150,000 Docker nodes, making it one of the world’s largest Docker clusters.

Why Dockerize MySQL at JD.com

MySQL usage grew rapidly from 2011, becoming the primary database for transaction systems by 2015. Docker offered four concrete benefits:

Rapid provisioning: A MySQL instance can be created in about one minute, dozens of times faster than installing a physical OS.

Dynamic scaling: Containers can be expanded online (CPU/Memory) without reboot, though disk capacity still requires careful planning.

Higher resource utilization: Containers isolate workloads at the process level, reducing CPU, memory and I/O contention compared with multiple MySQL instances sharing a single host.

Cost reduction: Better server and rack utilization lowers hardware and operational expenses.

JD.com’s mature Docker ecosystem—built through multiple large‑scale sales events (618, Double 11)—provided the reliability needed for production MySQL workloads.

Preparation Work Before Dockerizing MySQL

Docker Management UI

A web‑based portal was developed to let DBAs create, pause, start, and online‑scale MySQL containers with a few clicks, reducing manual effort.

Container Allocation Algorithm

The scheduler ensures high availability by preventing master‑slave pairs from being placed on the same host. Selection criteria include:

Host health status (alive, reachable).

Available resources (CPU, memory, disk).

Weight calculation based on resource usage.

Deduplication logic that excludes a host already chosen for a previous container in the same cluster.

Template and I/O‑Aware Scheduling

MySQL container templates (e.g., 8C/12G/500G, 12C/24G/500G) are defined in advance. The scheduler prefers hosts with low current I/O load for containers that request high I/O performance, thereby reducing cross‑container interference.

Integration with DB Management Platform

APIs were added to the existing database management system to support batch provisioning, decommissioning, and queries such as:

Given a host IP, list all MySQL containers running on it.

Given a container IP, retrieve its host and related metadata.

Monitoring Adjustments

JD.com uses Zabbix for MySQL metrics. Because Docker containers do not expose host‑level load via standard OS commands, a custom agent runs on each host, aggregates load data, stores it in Redis, and lets Zabbix pull the values from Redis.

Problems Encountered and Solutions

OpenStack scaling limits: At >10,000 physical nodes, message loss and agent hangs occurred. JD.com built a custom Python RPC framework named brood to replace MQ, and used the internal JIMDB cache for DB operations, eliminating the bottleneck.

Kernel‑level bugs: Issues such as MAC table overflow, slab memory contention, and UDP packet loss surfaced at large scale. An internal Linux‑kernel team created a JD‑specific kernel branch with patches to address these problems.

Zabbix agent reliability: Agents sometimes failed to restart after host reboot. A rc.local entry was added to ensure the Zabbix agent is started automatically.

Disk I/O interference: High‑I/O containers could degrade performance of co‑located containers. The scheduler now isolates I/O‑intensive workloads onto separate hosts.

Metric discrepancy between Docker and physical hosts: Load values collected inside containers differ from host values. The monitoring pipeline was adjusted to fetch host‑level metrics via the custom agent and store them centrally.

Current Deployment and Outlook

As of the latest release, over 70 % of JD.com’s MySQL instances run inside Docker containers, supporting multiple major sales events with stable performance. The fleet comprises roughly 150,000 containers, ranking among the world’s largest Docker deployments.

Future work includes:

Further automation of online scaling (reducing manual trigger steps).

Continued kernel optimization and maintenance of the JD‑specific branch.

Extending containerization to remaining critical workloads as Docker technology matures.

Key Technical Q&A (Condensed)

When to adopt MySQL Docker: Start with low‑impact services once the Docker platform is proven stable; high‑throughput or latency‑sensitive databases may remain on bare metal until Docker maturity improves.

Cost estimation: Calculate based on CPU cores, memory, and disk size per template (e.g., 8C/12G/500G, 12C/24G/500G, 12C/48G/1000G, 16C/48G/1000G).

I/O limits: CPU and memory can be strictly isolated; I/O shares the underlying hardware and may still cause interference, mitigated by host‑level scheduling.

Template modification: Templates can be upgraded if the host has sufficient free resources (e.g., increasing disk from 500 GB to 1 TB).

Data archiving: Active data is migrated to a historical MySQL cluster; older data can be offloaded to HBase or Hadoop.

Backup strategy: Use mysqldump or xtrabackup inside the container, store backups locally, then upload to external storage and schedule periodic snapshots.

OLAP suitability: Small analytical workloads can run on MySQL replicas; large‑scale analytics should use dedicated big‑data platforms (Hadoop, etc.).

Middleware vs. application‑level sharding: Some services use middleware; others implement sharding directly in the application.

Auto‑scaling: Current online scaling requires manual trigger; full automatic scaling is limited by host resource availability.

Volume sharing: Database containers use host‑local storage; shared Docker volumes are avoided to prevent data consistency issues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Dockercontainerizationmysql
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.