Can Eventual Consistency Boost Kubernetes Performance at the Edge?
This article examines how the strong‑consistency design of etcd limits Kubernetes scalability and latency in edge environments, presents experimental results on etcd performance, and proposes an eventual‑consistency storage layer to improve performance, availability, and scalability for edge deployments.
1 Introduction
In recent years Kubernetes has become the dominant container‑orchestration platform, but edge scenarios with thousands of low‑CPU/RAM nodes demand higher performance, availability, and scalable scheduling. Kubernetes stores all control‑plane state in etcd, a strongly consistent key‑value store. Large etcd clusters improve availability but increase request latency and reduce throughput, and about 30% of Kubernetes requests are writes, directly affecting latency and availability, making Kubernetes less suitable for strict‑performance edge use cases.
2 Kubernetes and etcd
Kubernetes groups containers into Pods, which are scheduled onto worker nodes and managed by the Kubelet. The control plane consists of stateless components that can be horizontally scaled, but all desired state is persisted in an etcd cluster. etcd’s strong consistency makes it a performance bottleneck, especially when deployed across multiple data‑centers where network partitions force a trade‑off between availability and consistency (CAP theorem).
2.1 etcd
etcd is a strongly consistent distributed key‑value store that uses the Raft consensus algorithm. It is typically deployed as a 3‑ or 5‑node cluster to balance high availability with the overhead of strong consistency. Because each write must be replicated to a majority of nodes, write latency dominates as the cluster grows.
2.2 Scheduling
When a ReplicaSet’s replica count changes, the controller watches for updates, creates a new Pod object, and writes it to etcd. The write must achieve quorum before the scheduler can assign the Pod to a node. After scheduling, the updated Pod is written back to etcd, triggering further watch events and additional write‑quorums. Each of these steps adds latency, and in large clusters the control‑plane leader becomes a bottleneck.
3 etcd Performance
Experiments using the official etcd benchmark tool measured put and linearizable range operations on clusters of varying size. Docker containers (2 CPUs, 1 GB RAM, SSD storage) simulated etcd nodes. Each test ran 10 repetitions with 100 000 operations from 1 000 clients. Results show that as the number of etcd nodes increases, both write latency and throughput degrade sharply due to the majority‑write requirement.
4 Eventual‑Consistency Storage
To replace etcd, an eventual‑consistency storage layer is proposed. It must expose the same Kubernetes API while allowing reads and writes to complete on a single node without immediate coordination with other nodes. Conflict‑free Replicated Data Types (CRDTs) provide a way to resolve divergent updates through delayed synchronization.
4.1 etcd API Compatibility
The new storage must present an API compatible with the existing etcd API so that Kubernetes components require no changes.
4.2 Delayed Synchronization
Using state‑based or operation‑based CRDTs, updates can be propagated asynchronously, reducing the critical‑path latency. Kubernetes resources, originally stored as protobuf, can be transformed into JSON CRDTs for eventual consistency.
4.3 Impact on Kubernetes
Because a significant fraction of etcd requests are transactional writes, removing strong consistency eliminates many write‑quorum delays. Although occasional stale reads may occur, Kubernetes controllers can correct inconsistencies over time, and the overall system gains higher throughput and lower latency.
5 Architecture Implementation
The eventual‑consistency store can be deployed inside the Kubernetes cluster and benefit from native horizontal pod autoscaling, unlike etcd which cannot scale horizontally due to its consistency constraints. This enables decentralized control‑plane components, reduces scheduling latency, and supports edge‑native function‑as‑a‑service workloads.
6 Related Work
Edge scenarios such as 5G, network‑computing, and elastic CDNs require low‑latency, reliable orchestration. Prior efforts like K3s, KubeEdge, and federated Kubernetes still rely on etcd’s strong consistency, limiting scalability. Research on CRDTs, adaptive consistency, and decentralized control planes informs the proposed design.
7 Conclusion
The dependence of Kubernetes on etcd’s strong consistency creates a performance, availability, and scalability bottleneck in large‑scale edge deployments. Replacing etcd with a decentralized, eventually consistent storage layer can dramatically improve latency, throughput, and resilience, paving the way for more efficient edge orchestration.
References
[1] A Byzantine failure in the real world. 2020. A Cloudflare blog post . https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/
[2] An open platform that extends upstream Kubernetes to Edge. 2020. OpenYurt . https://openyurt.io/en-us/index.html
[3] K3s: The certified Kubernetes distribution built for IoT & Edge computing. 2020. https://k3s.io/
[4] KubeEdge: An open platform to enable Edge computing. 2020. https://kubeedge.io/en/
[5] KubeFed: Kubernetes Cluster Federation. 2020. https://github.com/kubernetes-sigs/kubefed
[6] SuperEdge: An edge‑native container management system for edge computing. 2020. https://github.com/superedge/superedge
[7] Cloud Controller Manager. 2021. https://kubernetes.io/docs/concepts/architecture/cloud-controller/
[8] etcd: A distributed, reliable key‑value store. 2021. https://etcd.io/
[9] etcd: Hardware recommendations. 2021. https://etcd.io/docs/v3.4.0/op-guide/hardware
[10] K0s: The Simple, Solid & Certified Kubernetes Distribution. 2021. https://k0sproject.io/
[11] Kubernetes kubeadm resource requirements. 2021. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
[12] Kubernetes: Production‑Grade Container Orchestration. 2021. https://kubernetes.io/
[13] Rook: Open‑Source, Cloud‑Native Storage for Kubernetes. 2021. https://rook.io/
[14] Scaling up etcd clusters. 2021. https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#scaling-up-etcd-clusters
[15] Why Large Organizations Trust Kubernetes. 2021. https://tanzu.vmware.com/content/blog/why-large-organizations-trust-kubernetes
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
