Cloud Native 6 min read

How to Scale Kubernetes to 5,000 Nodes: Master, API Server, and Component Tuning

This guide explains how to push a Kubernetes cluster toward its theoretical limit of 5,000 nodes by detailing official limits, master node sizing for GCE and AWS, kube‑apiserver high‑availability and connection‑count tuning, scheduler and controller‑manager leader election settings, kubelet optimizations, and DNS anti‑affinity configuration.

Full-Stack DevOps & Kubernetes

Dec 7, 2022

How to Scale Kubernetes to 5,000 Nodes: Master, API Server, and Component Tuning

Kubernetes has officially claimed support for up to 5,000 nodes per cluster since v1.6, but reaching that number in practice requires careful tuning.

Official Limits

Maximum 5,000 nodes

Maximum 150,000 pods

Maximum 300,000 containers

Maximum 100 pods per node

Master Node Configuration Optimization

GCE recommended instance types :

1‑5 nodes: n1-standard-1

6‑10 nodes: n1-standard-2

11‑100 nodes: n1-standard-4

101‑250 nodes: n1-standard-8

251‑500 nodes: n1-standard-16

More than 500 nodes: n1-standard-32

AWS recommended instance types :

1‑5 nodes: m3.medium

6‑10 nodes: m3.large

11‑100 nodes: m3.xlarge

101‑250 nodes: m3.2xlarge

251‑500 nodes: c4.4xlarge

More than 500 nodes: c4.8xlarge

Corresponding CPU and memory:

1‑5 nodes: 1 vCPU / 3.75 GiB

6‑10 nodes: 2 vCPU / 7.5 GiB

11‑100 nodes: 4 vCPU / 15 GiB

101‑250 nodes: 8 vCPU / 30 GiB

251‑500 nodes: 16 vCPU / 60 GiB

More than 500 nodes: 32 vCPU / 120 GiB

kube‑apiserver Optimization

High Availability

Run multiple kube‑apiserver instances behind an external load balancer.

Set --apiserver-count and --endpoint-reconciler-type so that several instances are added to the Service endpoints, achieving HA.

Because TLS reuses connections, true load‑balancing is limited; a server‑side rate limiter can be added to signal clients to back off when thresholds are reached.

Control Connection Count

The following flags control the number of in‑flight requests:

--max-mutating-requests-inflight int   The maximum number of mutating requests in flight at a given time. When the server exceeds this, it rejects requests. Zero for no limit. (default 200)
--max-requests-inflight int            The maximum number of non‑mutating requests in flight at a given time. When the server exceeds this, it rejects requests. Zero for no limit. (default 400)

Recommended settings:

For clusters with 1,000‑3,000 nodes:

--max-requests-inflight=1500
--max-mutating-requests-inflight=500

For clusters with more than 3,000 nodes:

--max-requests-inflight=3000
--max-mutating-requests-inflight=1000

kube‑scheduler and kube‑controller‑manager Optimization

High Availability

Both components achieve HA via leader election. Add the following flags:

--leader-elect=true
--leader-elect-lease-duration=15s
--leader-elect-renew-deadline=10s
--leader-elect-resource-lock=endpoints
--leader-elect-retry-period=2s

Control QPS

Recommended QPS limit for communication with the API server:

--kube-api-qps=100

Kubelet Optimization

Set --image-pull-progress-deadline=30m Set --serialize-image-pulls=false (requires Docker overlay2)

Maximum pods per node: --max-pods=110 (default 110, adjust as needed)

Cluster DNS High Availability

Configure anti‑affinity so that kube‑dns or CoreDNS pods are spread across different nodes, avoiding a single point of failure:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      labelSelector:
        matchExpressions:
        - key: k8s-app
          operator: In
          values:
          - kube-dns
      topologyKey: kubernetes.io/hostname

cloud-native operations Kubernetes Performance Tuning cluster scaling

Written by

Full-Stack DevOps & Kubernetes

Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.