How to Scale Kubernetes Clusters: Quotas, Kernel Tweaks, and Etcd Best Practices
This guide explains how to adjust node quotas, tune kernel parameters, configure high‑availability etcd clusters, and set optimal Kube‑APIServer and Pod settings for large‑scale Kubernetes deployments, ensuring stability and performance as the cluster grows.
1. Node Quotas and Kernel Parameter Tuning
When a Kubernetes cluster on a public cloud grows, you may encounter quota limits and need to increase them on the cloud platform. Quotas to enlarge include:
Number of virtual machines
Number of vCPUs
Number of internal IP addresses
Number of external IP addresses
Number of security groups
Number of route tables
Persistent storage size
Reference GCE master node configurations as node count increases:
1‑5 nodes: n1-standard-1
6‑10 nodes: n1-standard-2
11‑100 nodes: n1-standard-4
101‑250 nodes: n1-standard-8
251‑500 nodes: n1-standard-16
More than 500 nodes: n1-standard-32
Alibaba Cloud kernel parameters (example):
<code># Maximum number of open file handles
fs.file-max=1000000
# ARP cache sizes
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=4096
net.ipv4.neigh.default.gc_thresh3=8192
# Netfilter connection tracking limits
net.netfilter.nf_conntrack_max=10485760
net.core.netdev_max_backlog=10000
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360
# Inotify limits
fs.inotify.max_user_instances=524288
fs.inotify.max_user_watches=524288</code>2. Etcd Database
High‑availability etcd cluster with automatic scaling
The current solution uses the etcd operator (provided by CoreOS) to simplify management of stateful applications. The operator extends the Kubernetes API to create, manage, and configure etcd instances automatically.
Etcd operator features:
Create/destroy: automatically deploy and delete etcd clusters without manual intervention.
Resize: dynamically scale the etcd cluster up or down.
Backup: support data backup and cluster restoration.
Upgrade: upgrade the etcd cluster without service interruption.
Additional etcd settings:
Use SSD storage for etcd.
Set
--quota-backend-bytesto increase storage limit (default 2 GB).
Configure a dedicated etcd storage for kube‑apiserver events.
4. Kube APIServer Configuration
For node counts ≥ 3000, recommended settings:
<code>--max-requests-inflight=3000
--max-mutating-requests-inflight=1000</code>For node counts between 1000 and 3000:
<code>--max-requests-inflight=1500
--max-mutating-requests-inflight=500</code>Memory target (in MB) based on node count:
<code>--target-ram-mb=node_nums * 60</code>5. Pod Configuration
Best practices for running Pods include setting resource requests and limits:
<code>spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage</code>Kubernetes classifies Pods into QoS classes based on these settings:
Guaranteed
Burstable
BestEffort
When resources are scarce, kubelet evicts Pods in the order: BestEffort > Burstable > Guaranteed.
Use nodeAffinity, podAffinity, and podAntiAffinity to spread critical workloads across nodes. Example for kube‑dns:
<code>affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- weight: 100
labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname</code>Prefer managing containers with controllers such as Deployment, StatefulSet, DaemonSet, and Job.
Kube‑scheduler configuration:
<code>--kube-api-qps=100 # default 50</code>Kube‑controller‑manager configuration:
<code>--kube-api-qps=100 # default 20
--kube-api-burst=100 # default 30</code>Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.