KubeAdmiral 1.0.0: A New Cloud‑Native Multi‑Cluster Orchestration Engine
Version 1.0.0 of KubeAdmiral, ByteDance’s open‑source multi‑cluster orchestration engine, introduces native Kubernetes API compatibility, advanced scheduling policies, fault‑tolerant migration, global status aggregation, and extensive hybrid‑cloud support, enabling seamless management of over 210 k machines across public and private clouds.
KubeAdmiral v1.0.0 is the first stable release of ByteDance’s open‑source multi‑cluster management engine, originally incubated internally and open‑sourced in July 2023. It now powers more than 210,000 machines and 10 million Pods for large‑scale services such as Douyin and Toutiao.
Multi‑cluster business background and KubeAdmiral’s evolution at ByteDance
ByteDance operates thousands of clusters across private data centers and multiple public‑cloud providers, leading to resource fragmentation, isolated clusters per business line, and complex operational overhead.
To address these challenges, the team first built on KubeFed v2 in 2019 but encountered limitations such as low resource utilization, inflexible scaling, limited scheduling semantics, and high integration cost.
Project Overview
KubeAdmiral (named after “Admiral”) extends Kubernetes with powerful multi‑cluster orchestration capabilities. It supports public‑cloud clusters (Volcengine, Alibaba Cloud, Huawei Cloud), private‑cloud clusters, and user‑managed clusters.
Architecture
The control plane runs in a host cluster and consists of:
Fed ETCD : stores federated Kubernetes resources.
Fed Kube Apiserver : native API server for federated resources.
Fed Kube Controller Manager : runs selected native controllers (e.g., namespace, garbage‑collector).
KubeAdmiral Controller : custom component handling cluster management, resource scheduling, fault migration, and status aggregation.
The KubeAdmiral Controller includes several sub‑controllers:
Federated Cluster Controller : manages lifecycle of member clusters.
Federate Controller : creates FederatedObject for each native resource.
Scheduler : decides replica distribution across clusters.
Sync Controller : propagates federated objects to member clusters.
Status Controller : collects resource status from all clusters.
Core Features
Unified Multi‑Cluster Management
Supports public‑cloud, private‑cloud, and self‑managed Kubernetes clusters.
Multi‑Cluster Application Distribution
Compatible with native resources (Deployment, StatefulSet, ConfigMap), CRDs, and Helm charts.
Provides static‑weight, dynamic‑weight, and replica‑based distribution modes.
Cluster selection via explicit list, label selector, or affinity rules.
Resource follow‑up dispatch for ConfigMap, Secret, Service, Ingress, etc.
Configurable rescheduling policies and maximum cluster count.
Fault Migration
Automatic migration of unschedulable replicas.
Manual or automatic eviction of workloads from unhealthy or decommissioned clusters.
Cross‑Cluster Autoscaling
Supports native and custom HPA across clusters.
Global Status Aggregation
Centralized status collection via Status Controller.
Aggregated status presented on native resources for a unified view.
Real‑time monitoring and automated fault detection/recovery.
Rich Scheduling Capabilities
Pluggable scheduler architecture (Filter, Score, Select, Replica) similar to kube‑scheduler.
Built‑in plugins implement policies defined in
PropagationPolicyobjects.
Extensible via HTTP‑based external plugins.
Policy Configuration Examples
<code>apiVersion: core.kubeadmiral.io/v1alpha1
kind: PropagationPolicy
metadata:
name: mypolicy
namespace: default
spec:
placement:
- cluster: Cluster-01
preferences:
weight: 40
- cluster: Cluster-02
preferences:
weight: 30
- cluster: Cluster-03
preferences:
weight: 40
clusterSelector:
IPv6: "true"
clusterAffinity:
- matchExpressions:
- key: region
operator: In
values:
- beijing
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
schedulingMode: Divide
reschedulePolicy:
disableRescheduling: true
maxClusters: 1
disableFollowerScheduling: false</code> <code>apiVersion: core.kubeadmiral.io/v1alpha1
kind: OverridePolicy
metadata:
name: example
namespace: default
spec:
overrideRules:
- targetClusters:
clusters:
- member1
- member2
clusterSelector:
region: beijing
az: zone1
overriders:
jsonpatch:
- path: "/spec/template/spec/containers/0/image"
operator: replace
value: "nginx:test"
image:
- imagePath: "/spec/templates/0/container/image"
operations:
- imageComponent: Registry
operator: addIfAbsent
value: cluster.io</code>Conclusion
KubeAdmiral v1.0.0 reflects a year of community and developer contributions, offering a production‑ready, cloud‑native multi‑cluster orchestration solution. It integrates tightly with Kubernetes APIs, supports extensive hybrid‑cloud scenarios, and provides extensible scheduling and fault‑tolerance mechanisms.
ByteDance Cloud Native
Sharing ByteDance's cloud-native technologies, technical practices, and developer events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.