Cloud Native 20 min read

Multi-Cluster Management in Kubernetes: Concepts, Practices, and Karmada Exploration

The article explains why enterprises adopt multi‑cluster Kubernetes architectures, reviews community solutions such as Karmada, Clusternet and OCM, and details vivo’s hybrid strategy that combines a unified UI for independent clusters with Karmada‑based federation for resource distribution, elastic scaling, cross‑cluster scheduling, and gray‑release migration.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Multi-Cluster Management in Kubernetes: Concepts, Practices, and Karmada Exploration

Why Multi-Cluster?

With rapid development of Kubernetes and cloud‑native technologies, enterprises face capacity limits of a single cluster, vendor lock‑in, burst traffic, high availability, active‑active data, and locality requirements. These reasons motivate the adoption of multi‑cluster and hybrid‑cloud architectures.

Multi‑Cluster Exploration

Community projects for multi‑cluster management include:

Federation v1 – deprecated because it introduced a separate API layer that conflicted with native Kubernetes APIs.

Federation v2 – also deprecated; it mainly federates RBAC and policies.

Karmada – builds on Federation v2 concepts, adding native API support, high‑availability deployment, automatic fault‑tolerance, cross‑cluster autoscaling, and service discovery.

Clusternet – an open‑source cloud‑native platform for managing multiple clusters and cross‑cluster application orchestration.

OCM – simplifies multi‑cloud cluster management and provides extensible add‑on framework.

Other related projects provide resource distribution (cluster‑api), multi‑cluster discovery (Clusterpedia), pod connectivity (Submariner), multi‑cluster ingress, service mesh (Istio, Cilium) and storage migration.

vivo’s Non‑Federated Multi‑Cluster Management

vivo uses a unified web UI to import Kubernetes clusters, view resources, and create Deployments, Services, and LoadBalancers. The system integrates CI/CD, monitoring, and alerting, keeping each cluster independent while providing a consolidated view.

vivo’s Federated Multi‑Cluster Exploration

Federated clusters unify resource management and scheduling via Karmada. Four focus areas are:

Resource distribution and orchestration

Elastic burst handling

Multi‑cluster scheduling

Service governance and traffic routing

Network capabilities rely on Service Mesh or Mesh Federation to enable cross‑cluster traffic steering and fault‑tolerance.

Application‑Centric Multi‑Cluster Practices

Key capabilities include a unified cloud‑native application definition, a common distribution center, Service Mesh‑based cross‑cloud governance, and native K8s multi‑cluster delivery.

3.1 Application Release

vivo registers multiple clusters with Karmada, which handles resource scheduling and fail‑over. The container platform manages K8s resources, Karmada policies, and CI/CD pipelines that generate manifests and deliver them via the platform API.

OpenKruise is used for advanced release patterns (e.g., game‑type workloads) involving ConfigMap, Secret, PV, PVC, Service, CloneSet, and custom resources. A relational database tracks release state, enabling rolling and gray releases.

Resource Interpreter Hooks (Karmada)

The following hook points extend Karmada to understand custom resources:

InterpretReplica : determines replica count from a ResourceTemplate.

ReviseReplica : adjusts replica count on the Work object based on scheduling decisions.

Retain : preserves fields that may be modified in member clusters.

AggregateStatus : aggregates status back to the ResourceTemplate.

3.2 Elastic Scaling

FedHPA – a federated Horizontal Pod Autoscaler that uses native HPA objects in each cluster, coordinated by a FedHPA controller on the Karmada control plane.

FedHPA workflow:

User creates an HPA specifying workload, CPU limits, min/max.

FedController computes resource distribution and creates HPA objects in each cluster with proportional min/max.

Clusters scale out when CPU thresholds are hit; status is reported back to Karmada.

FedHPA controller reconciles replica counts across clusters, ensuring consistency.

Resource re‑balancing adjusts min/max when cluster capacity changes.

CronHPA – a time‑based scaler defined by a CronHPA custom resource; the controller triggers scaling at configured times using a go‑cron library.

Manual and targeted scaling are also supported via OpenKruise’s CloneSet and AdvancedStatefulSet, with Karmada’s Resource Interpreter handling custom resource propagation.

3.3 Unified Scheduling

Multi‑cluster scheduling in Karmada distributes replicas across clusters based on static or dynamic schedulers, handling fault‑tolerant migration and re‑scheduling when a cluster fails.

Re‑scheduling addresses cases where workloads are not successfully launched after initial scheduling due to changing cluster resources.

Single‑cluster scheduler simulators are used for capacity estimation, but they differ from production schedulers; custom simulators can be built with fake clients.

3.4 Gray Release and Migration

Application migration moves non‑federated workloads into Karmada without user impact, using a whitelist in the container platform. Rollback removes applications from the whitelist and adds annotations to prevent Karmada from modifying resources.

Migration strategy follows “test → pre‑release → production”, with staged gray releases (1:2:7 ratio) and monitoring checkpoints.

Conclusion

vivo’s current approach combines non‑federated multi‑cluster management with CI/CD for static releases, offering rolling, gray, manual, targeted, and elastic scaling. Federated capabilities (resource federation, cross‑cluster scheduling, fault‑tolerance) are still under exploration. Enterprises should evaluate their needs, establish robust O&M and monitoring, and build migration/rollback mechanisms for existing non‑federated resources.

Kubernetesresource managementmulti-clusterschedulingScalingCloud-NativeKarmada
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.