Why Multi-Cloud Active-Active Architecture Is the Key to Stability and Cost Efficiency
This article explores the motivations, challenges, and design principles behind adopting a multi‑cloud active‑active architecture, emphasizing how it enhances stability, reduces costs, and improves efficiency, while detailing practical solutions for networking, compute, containers, service discovery, traffic routing, and data storage in a cloud‑native environment.
Background
Enterprises choose hybrid cloud primarily for stability, cost, and service considerations, with the ultimate goal of achieving an active‑active architecture.
1. Stability
During early business exploration, a single‑cloud single‑active setup is common for efficiency. As the business grows, single‑active cannot meet stability requirements, prompting deployment across multiple availability zones. Cloud providers improve stability, but central services (e.g., regional networking, unified billing) remain potential single points of failure. High‑concentration workloads such as online courses, health codes, or ride‑hailing demand higher SLA, making multi‑cloud active‑active a necessary trend.
2. Cost & Service
After migrating to the cloud, companies seek to maximize cost efficiency. Adding more vendors helps reduce costs, whether for disaster recovery, peak‑time elasticity, or business segmentation. The most thorough solution is equal deployment across different clouds, enabling traffic and capacity scheduling.
Challenges
While the benefits of active‑active architecture are clear, significant challenges arise in stability, cost, and efficiency.
1. Stability
Active‑active aims to solve stability, but incomplete multi‑cloud closures and inter‑service dependencies can increase failure rates. Assuming failure probabilities n and m for two clouds, the combined failure rate is n × m, dramatically lower than single‑cloud failures. However, heterogeneous deployments can raise the effective failure rate to max(n, m) or even n + m when critical services are unevenly distributed.
2. Cost
Multi‑cloud is intended to address cost, yet poor stability on each cloud forces redundant capacity, leading to substantial waste when only half the capacity is used under normal conditions.
3. Efficiency
Both development and operation efficiency suffer when multi‑cloud parity is lacking, creating a feedback loop that degrades stability. Without equal deployment, continuous drills are required to maintain effectiveness, and any lapse can render the architecture ineffective during a single‑cloud outage.
Design Goals
To achieve the desired stability and cost benefits, the online business at Zuoyebang adopts a multi‑cloud active‑active strategy, leveraging Kubernetes’ unified north‑bound APIs to smooth out differences across clouds.
Architecture Overview
Network
Zuoyebang has developed a multi‑cloud networking + CPE control solution that provides inter‑cloud connectivity, elastic bandwidth scaling, cross‑cloud traffic observability, automatic failover for node/line failures, and rapid onboarding of new cloud providers.
Compute
Under a single‑cloud model, the SYS team faces heavy maintenance of diverse instance types. Multi‑cloud introduces a combinatorial explosion of instance varieties, which can only be managed by standardizing on a limited set of primary instance types and applying scenario‑based packages.
Container Technology
Containers bridge IaaS differences, but cloud‑native advantages such as offline mixing and Serverless require a standardized container middleware across all major cloud providers.
Service Registration & Discovery
Services must be transparent to business logic to minimize disruption during hybrid‑cloud scheduling. Zuoyebang replaced its service registry with a solution that enables smooth transition while supporting both synchronous RPC and asynchronous calls.
Service Observation
Unified logging, monitoring, and tracing provide a single view, reducing the impact of hybrid‑cloud complexity.
Traffic Scheduling
North‑south traffic is routed primarily by domain name. Precise multi‑cloud traffic distribution aims for ±1% error, with failover recovery within five minutes. Zuoyebang built a DoH‑based solution on CoreDNS to supplement DNS for robust traffic steering.
Data Storage
Multi‑cloud storage faces classic CAP trade‑offs; depending on business needs, Zuoyebang adopts either master‑slave, unit‑based, or MGR solutions to balance availability and consistency.
Application Layer
Two primary requirements exist: (1) prohibit cross‑cloud calls during normal operation to avoid “snowball” effects, and (2) allow flexible traffic routing in special cases such as data‑center migration or single‑cloud incidents. Zuoyebang resolves this with isolation zones plus a connectivity zone, and isolates real‑time big‑data services that lack native multi‑cloud support.
Conclusion
The multi‑cloud active‑active architecture at Zuoyebang is not merely about managing multiple Kubernetes clusters or traffic routing; it represents a comprehensive enterprise solution spanning resources, platforms, and applications, requiring close collaboration across SYS, container R&D, middleware, SRE, DBA, DevOps, FinOps, and security teams.
Zuoyebang Tech Team
Sharing technical practices from Zuoyebang
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.