Cloud Native 32 min read

How KubeVela Scales: Load‑Testing Results and Performance Optimizations for v1.8

This guide details KubeVela's three‑year evolution, presents a comprehensive load‑testing history, explains step‑by‑step configuration for high‑performance and robust control planes, describes various optimization techniques such as state‑persistence parallelism, AppKey indexing, informer cache reduction, direct cluster‑gateway connections and controller sharding, and summarizes extensive single‑shard, multi‑shard, multi‑cluster and large‑scale experiments that demonstrate v1.8's superior scalability and stability.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How KubeVela Scales: Load‑Testing Results and Performance Optimizations for v1.8

KubeVela is an OAM‑based application delivery platform that has been in production for over three years. Systematic load‑testing experiments were performed to determine whether a single KubeVela control plane can host thousands of applications and to identify performance‑tuning strategies.

Load‑Testing History

August 2021 – Simple‑application test on a cluster with 1 000 virtual nodes and 12 000 applications; identified Kubernetes apiserver rate‑limiting as a bottleneck.

February 2022 – v1.2 workflow‑based application test; optional flags disabled ApplicationRevision and boosted performance >250%.

August 2022 – v1.6 workflow engine test; removed unnecessary CUE value initialization, cutting CPU usage by 75%.

February 2023 – v1.8 comprehensive test covering simple, large, sharded, multi‑cluster and continuous‑update scenarios.

Basic KubeVela Application Flow

User creates/updates/deletes an Application via VelaCLI, kubectl, REST API or VelaUX.

Request passes through a mutating webhook for validation and defaulting.

The Application object is stored in etcd; vela‑core informer watches for events.

Application controller adds a finalizer, computes the desired state, and creates/updates resources in the target cluster.

Controller executes workflow‑conditioned steps, performs garbage collection, and updates the Application status.

Configuring a High‑Performance, Robust Control Plane

Observability plugins : install kube-state-metrics, prometheus-server and grafana (optional Grafana dashboard import).

Remove webhooks to reduce request latency:

kubectl delete mutatingwebhookconfiguration kubevela-vela-core-admission
kubectl delete validatingwebhookconfiguration kubevela-vela-core-admission

(If keeping them, add --set admissionWebhooks to the install command.)

Enable sharding (v1.8+) : --set sharding.enabled=true and install the vela-core-shard-manager plugin.

Prefer internal network connections between the control plane and managed clusters to reduce latency and increase throughput.

Load‑Testing Methodology

After deploying the KubeVela control plane, the following steps can be used to simulate load:

Optionally use kubemark pods to create fake nodes that register as empty nodes with the apiserver, allowing massive pod counts without real resources.

Optionally spin up k3d or KinD clusters to emulate many managed clusters.

Use the official bulk‑deploy script (GitHub repository:

https://github.com/kubevela/kubevela/tree/master/hack/load-test#use-of-application-bulk-deploy-scripts

) to launch thousands of applications concurrently.

Performance Optimizations (v1.8)

State‑persistence parallelism : increase concurrency to 5, improving persistence speed by ~30%.

AppKey indexing for list operations : add cache indexes to reduce list latency from 40 ms to 25 µs.

Filter unnecessary updates : skip empty patches when an application is stable, cutting coordination time by ~20%.

Informer cache reduction :

Strip managedFields and kubectl.kubernetes.io/last-applied-configuration before caching.

Share ComponentDefinition, TraitDefinition and WorkflowStepDefinition across ApplicationRevision s.

Disable ConfigMap informer cache to avoid caching unused ConfigMaps.

Direct cluster‑gateway connection : let the application controller talk directly to the cluster‑gateway, reducing multi‑cluster request latency by ~40%.

Controller sharding (v1.8.0) : split the controller into multiple shards, each handling a subset of applications, enabling horizontal scaling without significant overhead.

Experimental Results

Single vs. Multi‑Shard

Three configurations were compared:

Single shard with 0.5 CPU / 1 Gi memory.

Three shards, each with 0.5 CPU / 1 Gi memory, handling 9 000 applications total.

Legacy single‑shard v1.7.5 (0.5 CPU / 1 Gi) as a baseline.

Three shards processed 3 000 applications each with similar CPU (≈0.1 core) and memory (≈320 MiB) usage as the single‑shard case, confirming horizontal scalability.

Multi‑Cluster Tests

Deploying 3 000 applications across remote clusters (control plane in Tokyo, clusters in Hangzhou) increased average controller request latency from ~20 ms (single‑cluster) to ~77 ms, while CPU and memory remained comparable.

Large‑Scale Test (40 000–50 000 Applications)

Configuration:

5 controller shards (8 CPU / 32 Gi each).

5 cluster‑gateway replicas (2 CPU / 1 Gi each).

200 managed clusters simulated on a 64‑core VM.

The system delivered 400 000 small applications with reconciliation times around 70 ms. Scaling to 500 000 applications after reducing shard resources (8 CPU / 16 Gi) showed no significant increase in reconcile latency. Beyond ~400 000 applications, the bottleneck shifts to the underlying Kubernetes apiserver and etcd.

Key Findings

v1.8.0 outperforms v1.7.5 across all scenarios due to the optimizations listed above.

Controller sharding enables the control plane to handle thousands of applications without linear resource growth.

In multi‑cluster deployments, network latency dominates; using internal network links mitigates this effect.

Beyond ~400 000 applications, the Kubernetes apiserver and etcd become the limiting factors.

Recommendations

Increase controller CPU, QPS and burst settings for higher parallelism.

Minimize cross‑cluster latency by using internal network connections and adequately sized cluster‑gateway pods.

Apply the listed optimizations (parallel persistence, AppKey indexing, update filtering, informer cache trimming, direct gateway, sharding) to achieve the best performance.

Further Resources

Project repository: https://github.com/kubevela/kubevela Official documentation: https://kubevela.io Performance‑test blog post (2021):

https://kubevela.net/blog/2021/08/30/kubevela-performance-test

Grafana dashboard for KubeVela system metrics:

https://grafana.com/grafana/dashboards/18200-kubevela-system/

Controller‑sharding guide:

https://kubevela.net/docs/platform-engineers/system-operation/controller-sharding
KubeVela multi‑cluster connection architecture
KubeVela multi‑cluster connection architecture
Controller sharding architecture
Controller sharding architecture
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeScalabilityKubernetesPerformance TestingLoad TestingKubeVelaController Sharding
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.