Cloud Native 15 min read

How KubeAdmiral Redefines Multi-Cluster Kubernetes Federation for Scale and Efficiency

Since Kubernetes became the de‑facto standard, ByteDance faced scaling limits with single‑cluster setups, prompting the adoption of KubeFed V2 and later the development of KubeAdmiral, a next‑generation multi‑cluster federation system that enhances scheduling, resource efficiency, native API support, and dynamic scaling across clouds.

Volcano Engine Developer Services

Jul 7, 2023

How KubeAdmiral Redefines Multi-Cluster Kubernetes Federation for Scale and Efficiency

Background

Since its open‑source launch in 2014, Kubernetes has become the de‑facto standard for orchestration, but a single‑cluster size of 5,000 nodes can no longer satisfy enterprise‑scale scenarios. ByteDance’s internal Kubernetes clusters grew beyond 500, with applications ranging from 0 to 20,000 replicas and a single workload exceeding one million cores.

Early isolation for security caused each business line to own exclusive clusters, leading to resource islands, low elasticity, and heavy manual effort to allocate resources across clusters. The rise of multi‑cloud and hybrid‑cloud architectures intensified the need for a federation layer that decouples applications from clusters and provides a unified entry point.

KubeFed V2 Implementation at ByteDance

In 2019 the infrastructure team built a federation on top of the community KubeFed V2 project. KubeFed distinguishes a control plane cluster from member clusters; users create "federated objects" in the control plane, and multiple controllers distribute those resources to member clusters. A federated object contains a Template, Placement, and Overrides.

apiVersion: types.kubefed.k8s.io/v1beta1
kind: FederatedDeployment
metadata:
  name: test-deployment
  namespace: test-namespace
spec:
  template: # define the full Deployment spec
    metadata:
      labels:
        app: nginx
    spec:
      ...
  placement:
    # distribute to two specific clusters
    clusters:
    - name: cluster1
    - name: cluster2
  overrides:
    # modify replica count in cluster2
    - clusterName: cluster2
      clusterOverrides:
      - path: spec.replicas
        value: 5

KubeFed also supports ReplicaSchedulingPreference (RSP) for advanced replica distribution, allowing per‑cluster weight, min, and max settings.

Limitations of KubeFed in Production

Low resource utilization – static RSP weights cannot adapt to changing cluster capacities.

Unsmooth scaling – instances may be unevenly distributed during scale‑out/in.

Limited scheduling semantics – good for stateless resources but poor for stateful services or jobs.

High integration cost – requires creating federated objects, which are incompatible with native APIs.

KubeAdmiral: Next‑Generation Federation

To meet higher efficiency, scale, performance, and cost requirements, ByteDance developed KubeAdmiral in late 2021, building on KubeFed V2. The name combines “admiral” (fleet commander) with “Kube”, emphasizing powerful multi‑cluster orchestration.

KubeAdmiral supports native Kubernetes APIs, offers an extensible scheduling framework, and refines scheduling algorithms for better replica distribution.

Rich Multi‑Cluster Scheduling Capabilities

The scheduler is the core component, calculating replica counts for each member cluster and influencing multi‑cluster disaster recovery, resource efficiency, and stability.

KubeAdmiral introduces richer scheduling semantics via PropagationPolicy , allowing cluster selection by labels, taints, or affinities, and supports stateful and batch workloads.

apiVersion: core.kubeadmiral.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: mypolicy
  namespace: default
spec:
  # multiple cluster selection methods, final result is the intersection
  placement:
    - cluster: Cluster-01
      preferences:
        weight: 40
    - cluster: Cluster-02
      preferences:
        weight: 30
    - cluster: Cluster-03
      preferences:
        weight: 40
  clusterSelector:
    IPv6: "true"
  clusterAffinity:
    - matchExpressions:
      - key: region
        operator: In
        values:
        - beijing
  tolerations:
    - key: "key1"
      operator: Equal
      value: "value1"
      effect: NoSchedule
  schedulingMode: Divide
  stickyCluster: false
  maxClusters: 1
  disableFollowerScheduling: false

For per‑cluster customizations, OverridePolicy can apply JSON‑Patch modifications based on cluster name or selector.

apiVersion: core.kubeadmiral.io/v1alpha1
kind: OverridePolicy
metadata:
  name: example
  namespace: default
spec:
  overrideRules:
  - targetClusters:
      clusters:
      - member1
      - member2
      clusterSelector:
        region: beijing
        az: zone1
      clusterAffinity:
        - matchExpressions:
          - key: region
            operator: In
            values:
            - beijing
          - key: provider
            operator: In
            values:
            - volcengine
      overriders:
        jsonpatch:
        - path: "/spec/template/spec/containers/0/image"
          operator: replace
          value: "nginx:test"

Extensible Scheduler Architecture

KubeAdmiral’s scheduler mirrors the kube‑scheduler design, abstracting the process into Filter, Score, Select, and Replica stages. Each stage is implemented by independent plugins, allowing custom logic without modifying the core control plane. Plugins can also be invoked via HTTP for external extensions.

Automatic Migration on Scheduling Failure

If a member cluster cannot schedule a replica due to node failures, taints, or affinity conflicts, KubeAdmiral can automatically migrate those replicas to other clusters with spare capacity, ensuring the overall replica count remains satisfied.

Dynamic Weight Scheduling Based on Cluster Water‑Level

Instead of static RSP weights, KubeAdmiral collects each cluster’s total and used resources, computes available capacity, and uses it as the replica‑distribution weight. This keeps member clusters balanced and maintains deployment rates above 95%.

Improved Replica Allocation Algorithm

KubeAdmiral refines KubeFed’s replica algorithm to avoid unexpected migrations during scale‑in/out. The new algorithm first computes the desired distribution, then adjusts using the distance between current and desired states, resulting in a more predictable replica placement.

Native Resource Support and Status Aggregation

KubeAdmiral accepts native Kubernetes resources (e.g., Deployment) and automatically converts them into internal federated objects, lowering the migration barrier. It also aggregates the status fields from all member clusters into a single view, providing a unified health snapshot.

Future Directions

Continue enhancing scheduling for stateful services and batch jobs, adding auto‑migration and cost‑aware scheduling.

Improve user experience with out‑of‑the‑box solutions to reduce cognitive load.

Boost observability, refine logs and metrics, and increase scheduler explainability.

Explore one‑click federation and multi‑cluster migration features.

KubeAdmiral has been incubated within ByteDance for years, powering the TCE platform that manages over 210,000 machines and 10 million pods across services like Douyin and Toutiao. It is now open‑source on GitHub.

GitHub: https://github.com/kubewharf/kubeadmiral

Kubernetes Multi-Cluster scheduling Federation KubeAdmiral

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.