How We Built a Hybrid Container‑VM Private Cloud: Lessons from a Large‑Scale Deployment
This article details the challenges and solutions encountered while transitioning a rapidly growing financial services platform from a VM‑centric private cloud to a hybrid environment that combines containers and virtual machines, covering network integration, IP management, container image standards, resource isolation, scheduling compatibility, and future lightweight container strategies.
Background
Tiánchéng Financial has a growing portfolio of financial and derivative products, leading to rapid expansion of its application systems and micro‑services. The backend operations platform must therefore support high concurrency, high availability, and high performance.
Introduction
Given the unstable lifecycle, unpredictable scale, and periodic traffic peaks of exploratory business systems, the team considered container technology to accelerate development while retaining the mature VM‑based operations framework, requiring extensive infrastructure, system, and network adaptations for seamless migration.
Considerations and Solutions
Network Layer
To remain compatible with existing services and operations, three conditions were required:
Each container must have an independent IP address.
The IP address must persist after container restarts.
The container network must directly communicate with the existing physical network.
Two approaches were evaluated:
Contiv (Cisco Docker network plugin)
Contiv could provide IP‑level control similar to Cisco ACI, but its SDN configuration was independent of ACI, required dual configuration, and the beta version exhibited bandwidth instability.
Macvlan
Macvlan creates separate VLAN tags on sub‑interfaces, giving each container a unique IP that can communicate with the physical network. It is a Linux kernel feature, stable and straightforward to use.
After comparing stability, maintainability, and operational complexity, the team selected macvlan for cross‑host container networking.
Container Layer
Image Standard : Instead of minimal single‑process images, the team builds images based on Red Hat, adding SSH, agents, and common diagnostic tools to resemble VM environments.
Process Management : Supervisord is launched inside containers to manage multiple processes, avoiding custom scripts and simplifying configuration.
Resource Isolation :
Memory : cgroup limits are set higher than the JVM heap; swappiness is minimized.
CPU : cgroup limits reserve 1–2 cores for the host, allowing other containers to share the remaining cores.
Disk : OverlayFS is used; monitoring scripts enforce per‑container size limits.
Scheduling Layer
The platform supports multiple schedulers. For VM‑based or Docker nodes, the platform automatically selects idle physical nodes for new instances. In low‑risk environments, Docker Swarm is used instead of Kubernetes to reduce deployment cost.
IP Management : IP addresses are treated as a global resource recorded in the cloud platform and allocated to application instances, preventing random IP assignment.
Instance Failover : When a node fails, the platform decommissions the faulty node and launches a new instance with the same IP, enabling migration between VM and container nodes via the macvlan network.
Future Outlook
Lightweight Exploration : The current containers are “fat” containers; the team plans to move toward single‑process containers that align with the traditional “one container, one process” model.
Data‑Layer Exploration : While containerization has been applied at the application layer, persistent services such as Redis, MongoDB, and MySQL still lack mature solutions for consistency and high availability in a multi‑node container environment.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.