How China Merchants Bank Built a Multi‑Cloud, Hybrid Cloud Native Platform in 3 Years
This article details China Merchants Bank’s three‑year journey from a traditional IT backbone to a fully native, multi‑cloud and hybrid cloud environment, explaining why they moved to the cloud, the selection of native and full‑stack solutions, the evolution of their private cloud platform, key challenges, risk mitigation, and lessons learned for large‑scale financial institutions.
1. Background and Cloud Selection
Before starting, three questions were raised: why move to the cloud, why choose a native full‑stack cloud, and why adopt multi‑cloud and hybrid cloud.
In 2015, CMB visited Silicon Valley and concluded that cloud computing is a decisive force for IT transformation. The bank needed an advanced private cloud to build a technology system that is flat, open, shared, agile, and supports integrated, fast‑growing business.
After the 2015 decision, a three‑year private‑cloud construction phase (2017‑2020) prepared the environment for a three‑year full‑cloud migration (2020‑2022).
1.1 Why Move to Cloud
Traditional mainframe and open systems (e.g., IBM z390, AS400, Tandem) became bottlenecks for innovation, suffering from limited scalability, outdated technology stacks, and poor elasticity.
Two insights emerged: technology leadership has shifted from vendor‑centric to open‑source‑centric, and financial‑industry cloud transformation requires autonomous innovation.
1.2 Why Native Full‑Stack Cloud
Native cloud demands advanced characteristics across architecture, organization, processes, and tools. Public clouds such as AWS exemplify native cloud, so a private cloud must emulate these best practices.
Full‑stack cloud means providing both IaaS and PaaS capabilities to support developers, service creators, and business users.
1.3 Why Multi‑Cloud and Hybrid Cloud
Following the BIModal IT model, CMB combines stable (steady‑state) and agile (cloud‑native) architectures, creating a "steady‑state cloud + agile cloud" strategy.
Hybrid cloud is defined as the integration of public and private clouds, realized as a unified multi‑cloud platform with high availability, security, and avoidance of vendor lock‑in.
1.4 Private‑Cloud Features
Two‑Flower Cloud : a financial‑core stable cloud and an open, agile native cloud.
One Cloud, Dual Stack : X86‑based general zone and a trusted‑zone (信创) stack.
One Cloud, Multiple Chips : X86 chips in the general zone and domestic chips in the trusted zone.
Open + Cloud‑Native : containerization, micro‑services, DevOps, Serverless, etc.
2. Cloud Construction Timeline and Current Status
The ACS IaaS platform evolved as follows:
ACS 1.0 (2015‑2018): private‑cloud decision, first cloud lab, branch‑level cloud pilot.
ACS 2.0 (2019): native cloud upgrade to 3‑AZ architecture, improved scale, availability, security.
ACS 2.1 (2022): full‑bank cloud decision, covering both steady‑state and agile workloads.
ACS 3.0 (2022): enterprise‑wide cloud with dual stack, dual chip, IPv4+IPv6 support, 100 % migration.
The ACS PaaS platform progressed from Pivotal CloudFoundry (2015) to RedHat OpenShift (2018) and then to a self‑developed CMB K8S (2022), achieving >90 % containerization and >99.995 % availability.
Application migration accelerated after full‑cloud launch: 700+ systems in 2020, 1 500+ in 2021, and 100 % of systems by September 2022, including core mobile banking apps.
Current scale: dual‑region + 11‑AZ architecture, >20 000 physical servers, >400 000 container instances, and a private‑cloud platform that matches or exceeds traditional mainframe availability.
3. Problems and Risks
3.1 Security
Physical isolation of DMZ and BIZ zones is required by regulators; virtual isolation (VPC, security groups) is used where permissible.
3.2 Network
Software‑Defined Networking (SDN) introduced latency, packet loss, and capacity challenges that were gradually optimized to eliminate customer complaints.
3.3 User Experience
Transition from a “service‑by‑caretaker” model to a self‑service model required tenant‑centric design and responsibility‑sharing mechanisms.
3.4 Operations & Maintenance
Cloud operations became as important as traditional ops; dedicated teams handle cloud governance, metering, and cost control.
3.5 Business Continuity
High availability is ensured through AZ isolation, rapid failover, and a “detect‑isolate‑switch” strategy rather than traditional fault‑diagnosis.
3.6 Migration Challenges
Large‑scale migration required traffic scheduling, gray‑release, hybrid deployment, IP preservation, unified logging, and end‑to‑end tracing.
4. Experience and Trends
4.1 Top‑Level Design
Continuous iteration of enterprise‑level cloud capability frameworks and principles is essential.
4.2 Core Platform Components
Platformization and service‑orientation hide infrastructure complexity; CMB developed its own cloud‑management and cloud‑operations platforms, CMDB, monitoring, and event systems.
4.3 Agile Infrastructure
Standardized application architectures enable scalable, elastic resource usage; IaaS delivery success rate exceeds 97 % with provisioning times under 22 minutes.
4.4 Resilience First
Decoupled availability, multi‑layer protection, and gray‑upgrade ensure business continuity.
4.5 Openness
Dual‑stack (X86 + domestic chips) and dual‑zone (general + trusted) architecture satisfy business needs, regulatory requirements, and cost sustainability.
4.6 Cutting‑Edge Technology
Continuous agile iteration drives the transition from On‑Cloud to In‑Cloud, leveraging cloud‑native development paradigms, micro‑service models, and chaos engineering.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.