How KubeAdmiral Redefines Multi-Cluster Kubernetes Federation for Scale and Efficiency
Since Kubernetes became the de‑facto standard, ByteDance faced scaling limits with single‑cluster setups, prompting the adoption of KubeFed V2 and later the development of KubeAdmiral, a next‑generation multi‑cluster federation system that enhances scheduling, resource efficiency, native API support, and dynamic scaling across clouds.
Background
Since its open‑source launch in 2014, Kubernetes has become the de‑facto standard for orchestration, but a single‑cluster size of 5,000 nodes can no longer satisfy enterprise‑scale scenarios. ByteDance’s internal Kubernetes clusters grew beyond 500, with applications ranging from 0 to 20,000 replicas and a single workload exceeding one million cores.
Early isolation for security caused each business line to own exclusive clusters, leading to resource islands, low elasticity, and heavy manual effort to allocate resources across clusters. The rise of multi‑cloud and hybrid‑cloud architectures intensified the need for a federation layer that decouples applications from clusters and provides a unified entry point.
KubeFed V2 Implementation at ByteDance
In 2019 the infrastructure team built a federation on top of the community KubeFed V2 project. KubeFed distinguishes a control plane cluster from member clusters; users create "federated objects" in the control plane, and multiple controllers distribute those resources to member clusters. A federated object contains a Template, Placement, and Overrides.
apiVersion: types.kubefed.k8s.io/v1beta1
kind: FederatedDeployment
metadata:
name: test-deployment
namespace: test-namespace
spec:
template: # define the full Deployment spec
metadata:
labels:
app: nginx
spec:
...
placement:
# distribute to two specific clusters
clusters:
- name: cluster1
- name: cluster2
overrides:
# modify replica count in cluster2
- clusterName: cluster2
clusterOverrides:
- path: spec.replicas
value: 5KubeFed also supports ReplicaSchedulingPreference (RSP) for advanced replica distribution, allowing per‑cluster weight, min, and max settings.
Limitations of KubeFed in Production
Low resource utilization – static RSP weights cannot adapt to changing cluster capacities.
Unsmooth scaling – instances may be unevenly distributed during scale‑out/in.
Limited scheduling semantics – good for stateless resources but poor for stateful services or jobs.
High integration cost – requires creating federated objects, which are incompatible with native APIs.
KubeAdmiral: Next‑Generation Federation
To meet higher efficiency, scale, performance, and cost requirements, ByteDance developed KubeAdmiral in late 2021, building on KubeFed V2. The name combines “admiral” (fleet commander) with “Kube”, emphasizing powerful multi‑cluster orchestration.
KubeAdmiral supports native Kubernetes APIs, offers an extensible scheduling framework, and refines scheduling algorithms for better replica distribution.
Rich Multi‑Cluster Scheduling Capabilities
The scheduler is the core component, calculating replica counts for each member cluster and influencing multi‑cluster disaster recovery, resource efficiency, and stability.
KubeAdmiral introduces richer scheduling semantics via PropagationPolicy , allowing cluster selection by labels, taints, or affinities, and supports stateful and batch workloads.
apiVersion: core.kubeadmiral.io/v1alpha1
kind: PropagationPolicy
metadata:
name: mypolicy
namespace: default
spec:
# multiple cluster selection methods, final result is the intersection
placement:
- cluster: Cluster-01
preferences:
weight: 40
- cluster: Cluster-02
preferences:
weight: 30
- cluster: Cluster-03
preferences:
weight: 40
clusterSelector:
IPv6: "true"
clusterAffinity:
- matchExpressions:
- key: region
operator: In
values:
- beijing
tolerations:
- key: "key1"
operator: Equal
value: "value1"
effect: NoSchedule
schedulingMode: Divide
stickyCluster: false
maxClusters: 1
disableFollowerScheduling: falseFor per‑cluster customizations, OverridePolicy can apply JSON‑Patch modifications based on cluster name or selector.
apiVersion: core.kubeadmiral.io/v1alpha1
kind: OverridePolicy
metadata:
name: example
namespace: default
spec:
overrideRules:
- targetClusters:
clusters:
- member1
- member2
clusterSelector:
region: beijing
az: zone1
clusterAffinity:
- matchExpressions:
- key: region
operator: In
values:
- beijing
- key: provider
operator: In
values:
- volcengine
overriders:
jsonpatch:
- path: "/spec/template/spec/containers/0/image"
operator: replace
value: "nginx:test"Extensible Scheduler Architecture
KubeAdmiral’s scheduler mirrors the kube‑scheduler design, abstracting the process into Filter, Score, Select, and Replica stages. Each stage is implemented by independent plugins, allowing custom logic without modifying the core control plane. Plugins can also be invoked via HTTP for external extensions.
Automatic Migration on Scheduling Failure
If a member cluster cannot schedule a replica due to node failures, taints, or affinity conflicts, KubeAdmiral can automatically migrate those replicas to other clusters with spare capacity, ensuring the overall replica count remains satisfied.
Dynamic Weight Scheduling Based on Cluster Water‑Level
Instead of static RSP weights, KubeAdmiral collects each cluster’s total and used resources, computes available capacity, and uses it as the replica‑distribution weight. This keeps member clusters balanced and maintains deployment rates above 95%.
Improved Replica Allocation Algorithm
KubeAdmiral refines KubeFed’s replica algorithm to avoid unexpected migrations during scale‑in/out. The new algorithm first computes the desired distribution, then adjusts using the distance between current and desired states, resulting in a more predictable replica placement.
Native Resource Support and Status Aggregation
KubeAdmiral accepts native Kubernetes resources (e.g., Deployment) and automatically converts them into internal federated objects, lowering the migration barrier. It also aggregates the status fields from all member clusters into a single view, providing a unified health snapshot.
Future Directions
Continue enhancing scheduling for stateful services and batch jobs, adding auto‑migration and cost‑aware scheduling.
Improve user experience with out‑of‑the‑box solutions to reduce cognitive load.
Boost observability, refine logs and metrics, and increase scheduler explainability.
Explore one‑click federation and multi‑cluster migration features.
KubeAdmiral has been incubated within ByteDance for years, powering the TCE platform that manages over 210,000 machines and 10 million pods across services like Douyin and Toutiao. It is now open‑source on GitHub.
GitHub: https://github.com/kubewharf/kubeadmiral
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
