How We Built a Multi‑Cloud, Multi‑Active Architecture at Zuoyebang
This article details Zuoyebang's journey from a single‑cloud setup to a multi‑cloud, multi‑active architecture, covering business drivers, design principles, network planning, compute and storage strategies, traffic scheduling, container migration, operational management, and the measurable cost, stability, and efficiency benefits achieved.
Multi‑Cloud Architecture at Zuoyebang
Ten years ago many tech companies debated which public cloud to choose; today a single‑cloud approach is being abandoned in favor of multi‑cloud, multi‑active architectures. Zuoyebang, a data‑intensive company, transformed its cloud environment after experiencing high traffic and concurrency, becoming an early adopter of cloud‑native practices.
Business Background
Zuoyebang serves tool, e‑commerce, live‑streaming, and smart‑hardware scenarios, with thousands of services, tens of thousands of instances, and hundreds of thousands of CPU cores, primarily using Go and PHP.
Relying on a public‑cloud foundation gave early cost benefits but also introduced challenges: single‑cloud failures, vendor lock‑in, and the need for disaster recovery, cost optimization, and data sovereignty. These pressures drove the shift to a multi‑cloud strategy.
Architecture Design: Choose Provider, Then Design
Key capability indicators include disaster recovery, failover, cost optimization, avoiding lock‑in, data sovereignty, and specific access. A weighted scoring model was used to select the optimal multi‑cloud solution.
Design goals after selection:
Prioritize north‑south traffic scheduling at entry points to avoid single‑line failures.
Ensure service registration and discovery are closed within each cloud.
Deploy identical services across clouds in the same city.
Migrate RDS to self‑built storage to avoid deep vendor lock‑in.
Enable low‑latency inter‑cloud networking.
Multi‑Cloud Practices
1. Multi‑Cloud Interconnect
Implemented a dual‑provider network with CPE control, achieving low‑latency inter‑cloud connectivity, flexible bandwidth scaling, cross‑cloud traffic analysis, automatic failover, and rapid onboarding of new providers.
2. Network Planning
Network is planned with security zones and deployment environments (IDC, production, office). Service types (data storage vs. PaaS) are segmented into distinct subnets.
3. Compute Management
Standardized machine types (bare metal, GPU, AMD, VM) are abstracted into packages; a CMDB tracks the full lifecycle of each instance.
4. Data Storage
RDS was migrated to self‑built storage to avoid vendor lock‑in. In multi‑cloud scenarios the classic CAP trade‑off applies; workloads choose CP or AP based on consistency vs. availability requirements.
5. Synchronization Communication
Container service discovery uses native K8s Service with IPVS; virtual machines register via a custom Sync service that maps ZNS node info to Service Endpoints.
6. Traffic Scheduling
Traditional DNS suffers from long TTL, hijacking, and cache issues. Zuoyebang built DoH/DoT services, routing all client requests through a unified SDK that resolves to trusted IPs, dramatically improving success rates and enabling precise cross‑cloud traffic ratios.
7. Container Migration
Service containerization standardizes deployment, simplifies multi‑cloud rollout, and reduces operational overhead.
8. Multi‑Cloud Migration
Deploy identical service sets to a second IDC, then gradually shift traffic north‑south. Asynchronous messaging (e.g., RocketMQ) required proxy services to hide cross‑cloud dependencies during migration.
Operational Management
Multi‑cloud operation follows a three‑pronged approach: solidify architecture standards, apply DevOps platform capabilities for consistent delivery, and use metrics, logging, and tracing to detect and remediate deviations.
Pre‑plan scenarios are abstracted into atomic actions, composed into scene‑level and global pre‑plans, enabling rapid, automated disaster recovery across clouds.
Benefits
Cost: competitive vendor pricing and flexible capacity scaling reduce waste.
Stability: multi‑active design plus observability and pre‑plans improve service reliability.
Efficiency: containerization and DevOps accelerate delivery and reduce manual effort.
Future Outlook
Further evolution will focus on distributed storage, precise north‑south traffic ratios, and deeper heterogeneous multi‑cloud capabilities.
In summary, Zuoyebang’s multi‑cloud, multi‑active architecture demonstrates that building robust cloud‑native systems requires both solid engineering and disciplined operations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
