Cloud Computing 10 min read

Scaling Mogujie's Private Cloud for 11.11: Architecture, Stability & Ops Insights

This article details how Mogujie's private cloud platform, built on OpenStack, Docker, and KVM, was engineered and optimized to handle the massive traffic of the 11.11 shopping festival, covering architectural choices, stability measures, monitoring, disaster recovery, performance tuning, and integration with existing operations systems.

21CTO
21CTO
21CTO
Scaling Mogujie's Private Cloud for 11.11: Architecture, Stability & Ops Insights

For Mogujie, the annual 11.11 shopping festival is the biggest test of system stability, disaster recovery, and rapid fault handling. Their private cloud platform, developed over a year and validated through three major promotions, is described from architecture, technology selection, and application perspectives.

Technical Architecture

The platform provides internal business teams with a foundational IaaS/PaaS service built on Docker‑based CaaS and KVM‑based IaaS. OpenStack is used to manage both containers and virtual machines, while Docker offers lightweight, fast‑starting, standardized packaging and image‑based gray‑release capabilities. KVM handles workloads requiring stronger isolation and security.

Stability Measures

Key stability improvements include upgrading the kernel to version 2.6.32‑504 to fix network namespace crashes, disabling device‑mapper discard to avoid random kernel crashes, and prohibiting disk over‑provisioning that could render filesystems read‑only.

Monitoring Enhancements

A custom container‑level monitoring tool calculates load per container for fine‑grained QPS throttling and replaces host‑wide commands (top, free, iostat, uptime) with container‑aware equivalents. Host monitoring adds multi‑dimensional thresholds for process health, kernel logs, PID counts, network connections, and OOM alerts.

Disaster Recovery and Emergency Handling

Disaster recovery strategies include offline data recovery for Docker using dmsetup create to mount temporary device‑mapper devices, and support for cold migration of containers across physical hosts via a one‑click management interface.

Integration with Existing Operations Systems

The Docker cluster integrates seamlessly with existing operation tools, enabling unified container management and achieving container creation within seven seconds.

Performance Optimizations

System‑level Docker optimizations involve tuning kernel parameters such as vm.dirty_expire_centisecs, vm.dirty_writeback_centisecs, and vm.extra_free_kbytes, and deploying Facebook’s flashcache to use SSD as a cache, dramatically improving I/O performance. Image pull times were reduced by flattening layer hierarchies, cutting size from 1.051 GB (13 layers, 2 min 13 s) to 674.4 MB (1 layer, 26 s).

Conclusion

The 11.11 event served as a comprehensive test of Mogujie's private cloud. While the platform has proven stable, ongoing challenges include container isolation, elastic scheduling, and future adoption of technologies such as Kubernetes, Mesos, CRIU, and runC for hot migration and daemon upgrades.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringperformanceDockerprivate cloudOpenStackKVM
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.