How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Etcd Best Practices
This guide explains how to adjust node quotas, kernel parameters, and etcd configurations for large Kubernetes clusters, covering cloud provider limits, GCE and Alibaba Cloud settings, API server tuning, and pod resource best practices to ensure reliable scaling and performance.
1. Node Quotas and Kernel Parameter Adjustments
When a Kubernetes cluster on a public cloud grows, you may encounter quota limits and need to increase them on the cloud platform. Quotas to enlarge include:
Number of virtual machines
Number of vCPUs
Number of internal IP addresses
Number of external IP addresses
Number of security groups
Number of route tables
Persistent storage size
Reference GCE master node types based on node count:
1‑5 nodes: n1-standard-1
6‑10 nodes: n1-standard-2
11‑100 nodes: n1-standard-4
101‑250 nodes: n1-standard-8
251‑500 nodes: n1-standard-16
More than 500 nodes: n1-standard-32
Reference Alibaba Cloud configuration (kernel parameters):
<code># max-file: maximum number of open file handles
fs.file-max=1000000
# ARP cache size
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=4096
net.ipv4.neigh.default.gc_thresh3=8192
# conntrack max entries
net.netfilter.nf_conntrack_max=10485760
# netdev max backlog
net.core.netdev_max_backlog=10000
# conntrack TCP timeout
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360
# inotify limits
fs.inotify.max_user_instances=524288
fs.inotify.max_user_watches=524288
</code>2. Etcd Database
High‑availability etcd cluster can be built with the etcd‑operator, which automates creation, scaling, backup, and upgrade of etcd clusters.
Create/Destroy: automatic deployment and removal of etcd clusters.
Resize: dynamic scaling of the cluster.
Backup: supports data backup and cluster restoration.
Upgrade: upgrade without service interruption.
Additional recommendations:
Use SSD storage for etcd.
Increase
--quota-backend-bytes(default 2 GB) to enlarge storage limits.
Configure a dedicated etcd storage for kube‑apiserver events.
3. Kube APIServer Configuration
For node counts ≥ 3000, set:
<code>--max-requests-inflight=3000
--max-mutating-requests-inflight=1000
</code>For node counts between 1000 and 3000, set:
<code>--max-requests-inflight=1500
--max-mutating-requests-inflight=500
</code>Memory target (in MB) can be calculated as:
<code>--target-ram-mb=node_nums * 60
</code>4. Pod Configuration
Best practices for pods include setting resource requests and limits, e.g.:
<code>spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage
</code>Kubernetes classifies pods into QoS classes based on these settings: Guaranteed, Burstable, and BestEffort. When resources are scarce, the kubelet evicts pods in the order BestEffort > Burstable > Guaranteed.
Use nodeAffinity, podAffinity, and podAntiAffinity to spread critical workloads, for example the kube‑dns configuration:
<code>affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- weight: 100
labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname
</code>Prefer managing containers with controllers such as Deployment, StatefulSet, DaemonSet, or Job. Adjust scheduler and controller‑manager QPS settings as needed (e.g.,
--kube-api-qps=100,
--kube-api-burst=100).
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.