How KubeVela Scales: Load‑Testing Results and Performance Optimizations for v1.8
This guide details KubeVela's three‑year evolution, presents a comprehensive load‑testing history, explains step‑by‑step configuration for high‑performance and robust control planes, describes various optimization techniques such as state‑persistence parallelism, AppKey indexing, informer cache reduction, direct cluster‑gateway connections and controller sharding, and summarizes extensive single‑shard, multi‑shard, multi‑cluster and large‑scale experiments that demonstrate v1.8's superior scalability and stability.
KubeVela is an OAM‑based application delivery platform that has been in production for over three years. Systematic load‑testing experiments were performed to determine whether a single KubeVela control plane can host thousands of applications and to identify performance‑tuning strategies.
Load‑Testing History
August 2021 – Simple‑application test on a cluster with 1 000 virtual nodes and 12 000 applications; identified Kubernetes apiserver rate‑limiting as a bottleneck.
February 2022 – v1.2 workflow‑based application test; optional flags disabled ApplicationRevision and boosted performance >250%.
August 2022 – v1.6 workflow engine test; removed unnecessary CUE value initialization, cutting CPU usage by 75%.
February 2023 – v1.8 comprehensive test covering simple, large, sharded, multi‑cluster and continuous‑update scenarios.
Basic KubeVela Application Flow
User creates/updates/deletes an Application via VelaCLI, kubectl, REST API or VelaUX.
Request passes through a mutating webhook for validation and defaulting.
The Application object is stored in etcd; vela‑core informer watches for events.
Application controller adds a finalizer, computes the desired state, and creates/updates resources in the target cluster.
Controller executes workflow‑conditioned steps, performs garbage collection, and updates the Application status.
Configuring a High‑Performance, Robust Control Plane
Observability plugins : install kube-state-metrics, prometheus-server and grafana (optional Grafana dashboard import).
Remove webhooks to reduce request latency:
kubectl delete mutatingwebhookconfiguration kubevela-vela-core-admission
kubectl delete validatingwebhookconfiguration kubevela-vela-core-admission(If keeping them, add --set admissionWebhooks to the install command.)
Enable sharding (v1.8+) : --set sharding.enabled=true and install the vela-core-shard-manager plugin.
Prefer internal network connections between the control plane and managed clusters to reduce latency and increase throughput.
Load‑Testing Methodology
After deploying the KubeVela control plane, the following steps can be used to simulate load:
Optionally use kubemark pods to create fake nodes that register as empty nodes with the apiserver, allowing massive pod counts without real resources.
Optionally spin up k3d or KinD clusters to emulate many managed clusters.
Use the official bulk‑deploy script (GitHub repository:
https://github.com/kubevela/kubevela/tree/master/hack/load-test#use-of-application-bulk-deploy-scripts) to launch thousands of applications concurrently.
Performance Optimizations (v1.8)
State‑persistence parallelism : increase concurrency to 5, improving persistence speed by ~30%.
AppKey indexing for list operations : add cache indexes to reduce list latency from 40 ms to 25 µs.
Filter unnecessary updates : skip empty patches when an application is stable, cutting coordination time by ~20%.
Informer cache reduction :
Strip managedFields and kubectl.kubernetes.io/last-applied-configuration before caching.
Share ComponentDefinition, TraitDefinition and WorkflowStepDefinition across ApplicationRevision s.
Disable ConfigMap informer cache to avoid caching unused ConfigMaps.
Direct cluster‑gateway connection : let the application controller talk directly to the cluster‑gateway, reducing multi‑cluster request latency by ~40%.
Controller sharding (v1.8.0) : split the controller into multiple shards, each handling a subset of applications, enabling horizontal scaling without significant overhead.
Experimental Results
Single vs. Multi‑Shard
Three configurations were compared:
Single shard with 0.5 CPU / 1 Gi memory.
Three shards, each with 0.5 CPU / 1 Gi memory, handling 9 000 applications total.
Legacy single‑shard v1.7.5 (0.5 CPU / 1 Gi) as a baseline.
Three shards processed 3 000 applications each with similar CPU (≈0.1 core) and memory (≈320 MiB) usage as the single‑shard case, confirming horizontal scalability.
Multi‑Cluster Tests
Deploying 3 000 applications across remote clusters (control plane in Tokyo, clusters in Hangzhou) increased average controller request latency from ~20 ms (single‑cluster) to ~77 ms, while CPU and memory remained comparable.
Large‑Scale Test (40 000–50 000 Applications)
Configuration:
5 controller shards (8 CPU / 32 Gi each).
5 cluster‑gateway replicas (2 CPU / 1 Gi each).
200 managed clusters simulated on a 64‑core VM.
The system delivered 400 000 small applications with reconciliation times around 70 ms. Scaling to 500 000 applications after reducing shard resources (8 CPU / 16 Gi) showed no significant increase in reconcile latency. Beyond ~400 000 applications, the bottleneck shifts to the underlying Kubernetes apiserver and etcd.
Key Findings
v1.8.0 outperforms v1.7.5 across all scenarios due to the optimizations listed above.
Controller sharding enables the control plane to handle thousands of applications without linear resource growth.
In multi‑cluster deployments, network latency dominates; using internal network links mitigates this effect.
Beyond ~400 000 applications, the Kubernetes apiserver and etcd become the limiting factors.
Recommendations
Increase controller CPU, QPS and burst settings for higher parallelism.
Minimize cross‑cluster latency by using internal network connections and adequately sized cluster‑gateway pods.
Apply the listed optimizations (parallel persistence, AppKey indexing, update filtering, informer cache trimming, direct gateway, sharding) to achieve the best performance.
Further Resources
Project repository: https://github.com/kubevela/kubevela Official documentation: https://kubevela.io Performance‑test blog post (2021):
https://kubevela.net/blog/2021/08/30/kubevela-performance-testGrafana dashboard for KubeVela system metrics:
https://grafana.com/grafana/dashboards/18200-kubevela-system/Controller‑sharding guide:
https://kubevela.net/docs/platform-engineers/system-operation/controller-shardingSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
