Cloud Computing 14 min read

Why Multi-Cloud Active-Active Architecture Is the Key to Stability and Cost Efficiency

This article explores the motivations, challenges, and design principles behind adopting a multi‑cloud active‑active architecture, emphasizing how it enhances stability, reduces costs, and improves efficiency, while detailing practical solutions for networking, compute, containers, service discovery, traffic routing, and data storage in a cloud‑native environment.

Zuoyebang Tech Team
Zuoyebang Tech Team
Zuoyebang Tech Team
Why Multi-Cloud Active-Active Architecture Is the Key to Stability and Cost Efficiency

Background

Enterprises choose hybrid cloud primarily for stability, cost, and service considerations, with the ultimate goal of achieving an active‑active architecture.

1. Stability

During early business exploration, a single‑cloud single‑active setup is common for efficiency. As the business grows, single‑active cannot meet stability requirements, prompting deployment across multiple availability zones. Cloud providers improve stability, but central services (e.g., regional networking, unified billing) remain potential single points of failure. High‑concentration workloads such as online courses, health codes, or ride‑hailing demand higher SLA, making multi‑cloud active‑active a necessary trend.

2. Cost & Service

After migrating to the cloud, companies seek to maximize cost efficiency. Adding more vendors helps reduce costs, whether for disaster recovery, peak‑time elasticity, or business segmentation. The most thorough solution is equal deployment across different clouds, enabling traffic and capacity scheduling.

Challenges

While the benefits of active‑active architecture are clear, significant challenges arise in stability, cost, and efficiency.

1. Stability

Active‑active aims to solve stability, but incomplete multi‑cloud closures and inter‑service dependencies can increase failure rates. Assuming failure probabilities n and m for two clouds, the combined failure rate is n × m, dramatically lower than single‑cloud failures. However, heterogeneous deployments can raise the effective failure rate to max(n, m) or even n + m when critical services are unevenly distributed.

2. Cost

Multi‑cloud is intended to address cost, yet poor stability on each cloud forces redundant capacity, leading to substantial waste when only half the capacity is used under normal conditions.

3. Efficiency

Both development and operation efficiency suffer when multi‑cloud parity is lacking, creating a feedback loop that degrades stability. Without equal deployment, continuous drills are required to maintain effectiveness, and any lapse can render the architecture ineffective during a single‑cloud outage.

Design Goals

To achieve the desired stability and cost benefits, the online business at Zuoyebang adopts a multi‑cloud active‑active strategy, leveraging Kubernetes’ unified north‑bound APIs to smooth out differences across clouds.

Architecture Overview

Network

Zuoyebang has developed a multi‑cloud networking + CPE control solution that provides inter‑cloud connectivity, elastic bandwidth scaling, cross‑cloud traffic observability, automatic failover for node/line failures, and rapid onboarding of new cloud providers.

Compute

Under a single‑cloud model, the SYS team faces heavy maintenance of diverse instance types. Multi‑cloud introduces a combinatorial explosion of instance varieties, which can only be managed by standardizing on a limited set of primary instance types and applying scenario‑based packages.

Container Technology

Containers bridge IaaS differences, but cloud‑native advantages such as offline mixing and Serverless require a standardized container middleware across all major cloud providers.

Service Registration & Discovery

Services must be transparent to business logic to minimize disruption during hybrid‑cloud scheduling. Zuoyebang replaced its service registry with a solution that enables smooth transition while supporting both synchronous RPC and asynchronous calls.

Service Observation

Unified logging, monitoring, and tracing provide a single view, reducing the impact of hybrid‑cloud complexity.

Traffic Scheduling

North‑south traffic is routed primarily by domain name. Precise multi‑cloud traffic distribution aims for ±1% error, with failover recovery within five minutes. Zuoyebang built a DoH‑based solution on CoreDNS to supplement DNS for robust traffic steering.

Data Storage

Multi‑cloud storage faces classic CAP trade‑offs; depending on business needs, Zuoyebang adopts either master‑slave, unit‑based, or MGR solutions to balance availability and consistency.

Application Layer

Two primary requirements exist: (1) prohibit cross‑cloud calls during normal operation to avoid “snowball” effects, and (2) allow flexible traffic routing in special cases such as data‑center migration or single‑cloud incidents. Zuoyebang resolves this with isolation zones plus a connectivity zone, and isolates real‑time big‑data services that lack native multi‑cloud support.

Conclusion

The multi‑cloud active‑active architecture at Zuoyebang is not merely about managing multiple Kubernetes clusters or traffic routing; it represents a comprehensive enterprise solution spanning resources, platforms, and applications, requiring close collaboration across SYS, container R&D, middleware, SRE, DBA, DevOps, FinOps, and security teams.

architecturecloud-nativeMulti-Cloudstabilityactive-activeCost efficiency
Zuoyebang Tech Team
Written by

Zuoyebang Tech Team

Sharing technical practices from Zuoyebang

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.