Cloud Native 9 min read

Why Enterprises Adopt Multi‑Cluster Kubernetes and How to Deploy It

This article explains why modern enterprises need multiple Kubernetes clusters—covering single‑cluster limits, hybrid‑cloud requirements, and fault‑tolerance—then compares two architectural models and reviews both Kubernetes‑centric federation and network‑centric service‑mesh solutions with practical implementation guidance.

Efficient Ops
Efficient Ops
Efficient Ops
Why Enterprises Adopt Multi‑Cluster Kubernetes and How to Deploy It

As Kubernetes becomes increasingly adopted in enterprises, many organizations operate multiple clusters in production. This article discusses the motivations for multi‑cluster Kubernetes, its benefits, and practical implementation approaches.

VMware’s 2020 Kubernetes Usage Report notes that 20% of organizations using Kubernetes run more than 40 clusters.

Why Enterprises Need Multiple Clusters

Single‑Cluster Capacity Limits

The official documentation states that a Kubernetes v1.12 cluster supports up to 5,000 nodes, 150,000 Pods, 300,000 containers, and no more than 100 Pods per node. These limits have not changed up to v1.20, indicating that increasing single‑cluster capacity is not a community focus. When a workload exceeds 5,000 nodes, multiple clusters become necessary.

Hybrid‑Cloud or Multi‑Cloud Architecture

Many companies adopt hybrid or multi‑cloud setups to serve global users, combine private data centers with public clouds (e.g., Alibaba Cloud for burst traffic), avoid vendor lock‑in, and control costs. Such architectures naturally require separate clusters per cloud provider.

Don’t Put All Eggs in One Basket

If the control plane of a single cluster fails, all services are impacted. Although Kubernetes control planes are designed for high availability, real‑world incidents show that heavy API‑server traffic can cause outages. Therefore, production environments enforce strict API‑server access controls, thorough testing, and often separate workloads from infrastructure, similar to using many ordinary machines instead of one supercomputer.

Benefits of Multi‑Cluster

Multi‑cluster deployments improve:

Availability

Isolation

Scalability

Multi‑Cluster Application Architecture

Two common models are used:

Replica Model: Deploy full application replicas across multiple availability zones or data centers. Traffic is routed to the nearest healthy cluster via Smart DNS or global load balancers, enabling failover.

Service‑Based Partitioning: Deploy services based on business relevance to different clusters, providing strong isolation at the cost of increased complexity.

Community Multi‑Cluster Solutions

Two main approaches have emerged:

Kubernetes‑Centric

This approach extends core Kubernetes primitives to support multi‑cluster use cases, providing a centralized management plane. The Kubernetes Cluster Federation project exemplifies this method, visualizing a meta‑cluster that orchestrates multiple Kubernetes control planes.

Federation essentially performs two tasks:

Cross‑cluster resource distribution using Templates, Placement, and Overrides, enabling multi‑cluster scaling.

Multi‑cluster service discovery supporting Services and Ingresses (still in alpha, requiring additional development for production use).

Network‑Centric

This method focuses on establishing network connections between clusters so that applications can communicate across them. Service‑mesh solutions such as Istio, Linkerd, and Consul Mesh provide multi‑cluster connectivity, while Cilium’s Cluster Mesh offers a CNI‑based solution that routes Pod IPs across clusters without gateways.

Cilium Cluster Mesh works by:

Each Kubernetes cluster maintains its own etcd, keeping states isolated.

Etcd proxies expose each cluster’s etcd; Cilium agents in other clusters monitor changes and replicate relevant state.

Cross‑cluster access is read‑only, preventing fault propagation.

Configuration is stored in a simple Kubernetes Secret containing remote etcd proxy addresses, cluster names, and TLS certificates.

Thoughts

The two approaches are not mutually exclusive. Many organizations combine cluster federation for deployment and release management with a service‑mesh for cross‑cluster traffic. In such hybrid architectures, workload clusters, the service‑mesh control plane, and gateways must integrate with external registries. The diagram below illustrates a typical combined solution.

Cloud NativeKubernetesmulti-clusterservice meshCiliumCluster Federation
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.