Cloud Native 9 min read

Why Enterprises Need Multi‑Cluster Kubernetes and How to Implement It

This article explains why modern enterprises adopt multiple Kubernetes clusters, covering single‑cluster capacity limits, hybrid‑cloud requirements, fault‑tolerance concerns, the benefits of multi‑cluster setups, architectural models, and community‑driven implementation patterns.

Efficient Ops

Jul 25, 2021

Why Enterprises Need Multi‑Cluster Kubernetes and How to Implement It

As Kubernetes becomes increasingly adopted in enterprises, many companies operate multiple clusters in production. This article discusses considerations for multi‑cluster Kubernetes, including why to choose it, its benefits, and implementation approaches.

VMware 2020 Kubernetes Usage Report noted that 20% of organizations run more than 40 clusters.

Why enterprises need multiple clusters?

Single‑cluster capacity limits

Official documentation for v1.12 states that a Kubernetes cluster supports up to 5,000 nodes, 150,000 pods, 300,000 containers, and no more than 100 pods per node. This limit has not changed up to v1.20, indicating that increasing single‑cluster capacity is not a community focus.

If a workload exceeds 5,000 machines, enterprises must consider multiple clusters.

Hybrid‑cloud or multi‑cloud architectures

Multi‑cloud or hybrid‑cloud setups are common. Global companies may run services across regions, or combine on‑premises data centers with public clouds such as Alibaba Cloud for elastic traffic. Public clouds also have finite resources and require advance provisioning for large promotions.

To avoid vendor lock‑in and control costs, many enterprises adopt multi‑cloud architectures, which naturally lead to multiple clusters.

Don’t put all eggs in one basket

Deploying all workloads to a single cluster creates a single point of failure. If the control plane fails, all services are impacted. Although the control plane is designed to be highly available, production incidents have shown that heavy API‑server traffic can cause outages.

Therefore, production environments need strict API‑server access controls, thorough testing, and possibly separating business workloads from infrastructure.

Benefits of multiple clusters

Multiple clusters improve:

Availability

Isolation

Scalability

Multi‑cluster application architectures

Two common models:

Replica model: Deploy full copies of an application to several availability zones or data centers. Smart DNS or global load balancers route traffic to the nearest healthy cluster, providing low latency and failover.

Service‑based partitioning: Deploy services to different clusters based on business relevance, offering strong isolation at the cost of more complex service division.

Community implementation patterns

Two main approaches are being explored.

Kubernetes‑centric

Extending core Kubernetes primitives to support multi‑cluster use cases, as done by the Kubernetes Cluster Federation project. Federation provides a logical control plane that orchestrates multiple master nodes, enabling cross‑cluster resource distribution and multi‑cluster service discovery.

Federation achieves:

Cross‑cluster resource propagation using Templates, Placement, and Overrides, allowing Deployments to be distributed and scaled across clusters.

Multi‑cluster service discovery for Services and Ingresses (currently alpha).

Network‑centric

This approach focuses on establishing network connections between clusters so that workloads can communicate directly. Service‑mesh solutions such as Istio, Linkerd, and Consul Mesh provide multi‑cluster traffic management. Cilium’s Cluster Mesh offers pod‑IP routing across clusters via tunnels or direct routing, without requiring gateways.

Each cluster maintains its own etcd; states never mix.

Etcd proxies expose cluster state; Cilium agents in other clusters watch changes and replicate relevant state.

Cross‑cluster access is read‑only, preventing fault propagation.

Configuration is done via a simple Kubernetes Secret containing etcd proxy address, cluster name, and certificates.

Reflection

The two patterns are not mutually exclusive. In practice, many companies combine them: use federation for deployment and release, and a service mesh for cross‑cluster traffic. Workloads, the mesh control plane, and gateways must integrate with external registries. The diagram below illustrates such a combined architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Multi-Cluster Federation

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.