Cloud Native 7 min read

How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Etcd Best Practices

This guide explains how to adjust node quotas, kernel parameters, and etcd configurations for large Kubernetes clusters, covering cloud provider limits, GCE and Alibaba Cloud settings, API server tuning, and pod resource best practices to ensure reliable scaling and performance.

Efficient Ops
Efficient Ops
Efficient Ops
How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Etcd Best Practices

1. Node Quotas and Kernel Parameter Adjustments

When a Kubernetes cluster on a public cloud grows, you may encounter quota limits and need to increase them on the cloud platform. Quotas to enlarge include:

Number of virtual machines

Number of vCPUs

Number of internal IP addresses

Number of external IP addresses

Number of security groups

Number of route tables

Persistent storage size

Reference GCE master node types based on node count:

1‑5 nodes: n1-standard-1

6‑10 nodes: n1-standard-2

11‑100 nodes: n1-standard-4

101‑250 nodes: n1-standard-8

251‑500 nodes: n1-standard-16

More than 500 nodes: n1-standard-32

Reference Alibaba Cloud configuration (kernel parameters):

<code># max-file: maximum number of open file handles
fs.file-max=1000000
# ARP cache size
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=4096
net.ipv4.neigh.default.gc_thresh3=8192
# conntrack max entries
net.netfilter.nf_conntrack_max=10485760
# netdev max backlog
net.core.netdev_max_backlog=10000
# conntrack TCP timeout
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360
# inotify limits
fs.inotify.max_user_instances=524288
fs.inotify.max_user_watches=524288
</code>

2. Etcd Database

High‑availability etcd cluster can be built with the etcd‑operator, which automates creation, scaling, backup, and upgrade of etcd clusters.

Create/Destroy: automatic deployment and removal of etcd clusters.

Resize: dynamic scaling of the cluster.

Backup: supports data backup and cluster restoration.

Upgrade: upgrade without service interruption.

Additional recommendations:

Use SSD storage for etcd.

Increase

--quota-backend-bytes

(default 2 GB) to enlarge storage limits.

Configure a dedicated etcd storage for kube‑apiserver events.

3. Kube APIServer Configuration

For node counts ≥ 3000, set:

<code>--max-requests-inflight=3000
--max-mutating-requests-inflight=1000
</code>

For node counts between 1000 and 3000, set:

<code>--max-requests-inflight=1500
--max-mutating-requests-inflight=500
</code>

Memory target (in MB) can be calculated as:

<code>--target-ram-mb=node_nums * 60
</code>

4. Pod Configuration

Best practices for pods include setting resource requests and limits, e.g.:

<code>spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage
</code>

Kubernetes classifies pods into QoS classes based on these settings: Guaranteed, Burstable, and BestEffort. When resources are scarce, the kubelet evicts pods in the order BestEffort > Burstable > Guaranteed.

Use nodeAffinity, podAffinity, and podAntiAffinity to spread critical workloads, for example the kube‑dns configuration:

<code>affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      labelSelector:
        matchExpressions:
        - key: k8s-app
          operator: In
          values:
          - kube-dns
      topologyKey: kubernetes.io/hostname
</code>

Prefer managing containers with controllers such as Deployment, StatefulSet, DaemonSet, or Job. Adjust scheduler and controller‑manager QPS settings as needed (e.g.,

--kube-api-qps=100

,

--kube-api-burst=100

).

KubernetesCluster Scalingetcdkernel parametersNode QuotasPod QoS
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.