Cloud Native 8 min read

How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Best Practices

This guide explains how to prepare large‑scale Kubernetes clusters on public clouds by expanding node quotas, tuning kernel parameters, configuring high‑availability etcd, adjusting kube‑apiserver limits, and applying pod‑level resource and affinity best practices.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Best Practices

1. Node Quota and Kernel Parameter Adjustments

When a Kubernetes cluster grows on a public cloud, you may encounter quota limits that must be increased in the cloud console. Required quotas include:

Number of virtual machines

vCPU count

Private IP addresses

Public IP addresses

Security group rules

Route table entries

Persistent storage size

Reference GCE master node sizing as the node count rises:

1‑5 nodes: n1-standard-1

6‑10 nodes: n1-standard-2

11‑100 nodes: n1-standard-4

101‑250 nodes: n1-standard-8

251‑500 nodes: n1-standard-16

>500 nodes: n1-standard-32

Alibaba Cloud kernel tuning example (sysctl settings):

# Maximum number of open file handles
ds.file-max=1000000
# ARP cache size parameters
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=4096
net.ipv4.neigh.default.gc_thresh3=8192
# Netfilter connection tracking limits
net.netfilter.nf_conntrack_max=10485760
net.core.netdev_max_backlog=10000
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360
# Inotify limits
fs.inotify.max_user_instances=524288
fs.inotify.max_user_watches=524288

2. Etcd Database

Deploy a highly available etcd cluster that can automatically add nodes as the cluster scales. The common solution is to use the etcd‑operator, which extends the Kubernetes API to manage etcd lifecycle.

Key features of the etcd‑operator:

create/destroy: automatic provisioning and removal of etcd clusters

resize: dynamic scaling of cluster size

backup: supports backup and restore of etcd data

upgrade: upgrade etcd without service interruption

Additional recommendations:

Store etcd data on SSDs

Increase --quota-backend-bytes (default 2 GB) to enlarge storage limits

Run a dedicated etcd cluster for kube‑apiserver event storage

3. Kube APIServer Configuration

For clusters with ≥ 3000 nodes, set:

--max-requests-inflight=3000
--max-mutating-requests-inflight=1000

For clusters with 1000‑3000 nodes, set:

--max-requests-inflight=1500
--max-mutating-requests-inflight=500

Memory target (in MB) scales with node count:

--target-ram-mb=node_nums * 60

4. Pod Configuration

Follow these best practices when running Pods:

Define resource requests and limits for containers, especially for core add‑on services.

spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage

Kubernetes classifies Pods into QoS classes based on their limits and requests:

Guaranteed

Burstable

BestEffort

When resources are scarce, the kubelet evicts Pods in the order: BestEffort → Burstable → Guaranteed.

Use nodeAffinity, podAffinity, and podAntiAffinity to spread critical workloads across nodes. Example for kube‑dns anti‑affinity:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      labelSelector:
        matchExpressions:
        - key: k8s-app
          operator: In
          values:
          - kube-dns
      topologyKey: kubernetes.io/hostname

Prefer managing containers with higher‑level controllers such as Deployments, StatefulSets, DaemonSets, or Jobs.

Additional scheduler and controller manager tuning:

Set --kube-api-qps=100 (default 50) for both kube‑scheduler and kube‑controller‑manager.

Set --kube-api-burst=100 (default 30) for the controller manager.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Kubernetescluster scalingetcdKernel ParametersNode QuotasPod QoSKubeAPIServer
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.