Cloud Native 7 min read

How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Etcd Best Practices

This guide explains how to adjust node quotas, kernel parameters, and etcd configurations for large Kubernetes clusters, covering cloud provider limits, GCE and Alibaba Cloud settings, API server tuning, and pod resource best practices to ensure reliable scaling and performance.

Efficient Ops

Jan 12, 2023

How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Etcd Best Practices

1. Node Quotas and Kernel Parameter Adjustments

When a Kubernetes cluster on a public cloud grows, you may encounter quota limits and need to increase them on the cloud platform. Quotas to enlarge include:

Number of virtual machines

Number of vCPUs

Number of internal IP addresses

Number of external IP addresses

Number of security groups

Number of route tables

Persistent storage size

Reference GCE master node types based on node count:

1‑5 nodes: n1-standard-1

6‑10 nodes: n1-standard-2

11‑100 nodes: n1-standard-4

101‑250 nodes: n1-standard-8

251‑500 nodes: n1-standard-16

More than 500 nodes: n1-standard-32

Reference Alibaba Cloud configuration (kernel parameters):

# max-file: maximum number of open file handles
fs.file-max=1000000
# ARP cache size
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=4096
net.ipv4.neigh.default.gc_thresh3=8192
# conntrack max entries
net.netfilter.nf_conntrack_max=10485760
# netdev max backlog
net.core.netdev_max_backlog=10000
# conntrack TCP timeout
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360
# inotify limits
fs.inotify.max_user_instances=524288
fs.inotify.max_user_watches=524288

2. Etcd Database

High‑availability etcd cluster can be built with the etcd‑operator, which automates creation, scaling, backup, and upgrade of etcd clusters.

Create/Destroy: automatic deployment and removal of etcd clusters.

Resize: dynamic scaling of the cluster.

Backup: supports data backup and cluster restoration.

Upgrade: upgrade without service interruption.

Additional recommendations:

Use SSD storage for etcd.

Increase --quota-backend-bytes (default 2 GB) to enlarge storage limits.

Configure a dedicated etcd storage for kube‑apiserver events.

3. Kube APIServer Configuration

For node counts ≥ 3000, set:

--max-requests-inflight=3000
--max-mutating-requests-inflight=1000

For node counts between 1000 and 3000, set:

--max-requests-inflight=1500
--max-mutating-requests-inflight=500

Memory target (in MB) can be calculated as:

--target-ram-mb=node_nums * 60

4. Pod Configuration

Best practices for pods include setting resource requests and limits, e.g.:

spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage

Kubernetes classifies pods into QoS classes based on these settings: Guaranteed, Burstable, and BestEffort. When resources are scarce, the kubelet evicts pods in the order BestEffort > Burstable > Guaranteed.

Use nodeAffinity, podAffinity, and podAntiAffinity to spread critical workloads, for example the kube‑dns configuration:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      labelSelector:
        matchExpressions:
        - key: k8s-app
          operator: In
          values:
          - kube-dns
      topologyKey: kubernetes.io/hostname

Prefer managing containers with controllers such as Deployment, StatefulSet, DaemonSet, or Job. Adjust scheduler and controller‑manager QPS settings as needed (e.g., --kube-api-qps=100, --kube-api-burst=100).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes cluster scaling kernel parameters Node Quotas Pod QoS

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.