How to Build a Production‑Ready, High‑Availability Kubernetes Cluster from Scratch
This guide walks through designing, deploying, securing, monitoring, backing up, and maintaining a production‑grade Kubernetes cluster, sharing real‑world pitfalls, configuration snippets, and best‑practice recommendations for high availability, security, observability, and upgrade strategies.
1. Architecture Design: Can Your Cluster Survive Peak Traffic?
Business‑driven architecture: Different workloads require distinct resource strategies. For compute‑intensive services, enable horizontal pod autoscaling (HPA) and Cluster Autoscaler. For I/O‑intensive workloads like log processing, use local SSDs with Local PersistentVolume. For mixed workloads, create dedicated node pools (e.g., gpu-worker, high-mem).
2. High‑Availability Details
Load‑balancer pitfalls: Do not rely on cloud provider HTTP(S) LB for the API server; use a TCP‑level LB such as HAProxy + Keepalived with health checks against /readyz.
# HAProxy configuration example
backend k8s-api
mode tcp
balance roundrobin
option tcp-check
tcp-check connect port 6443
tcp-check send GET /readyz HTTP/1.0
Host: k8s-api
tcp-check expect string ok
server master1 10.0.0.1:6443 check
server master2 10.0.0.2:6443 check
server master3 10.0.0.3:6443 checkCross‑datacenter deployment: For three zones, use a 5‑node layout (2‑A, 2‑B, 1‑C) to avoid split‑brain.
Disk isolation: Keep etcd nodes on dedicated disks to prevent I/O contention.
Worker node “hot” and “cold” zones: Run critical services on hot nodes (no auto‑scaling) and batch jobs on cold nodes (e.g., Spot instances).
3. Cluster Installation – Avoid Tool‑Chain Traps
3.1 kubeadm Pros & Cons
Suitable for: Small‑to‑medium clusters (<200 nodes) with standardized environments.
Critical flaw: Default certificates expire after one year, causing outages if not renewed.
# Manually renew certificates before expiry
kubeadm certs renew all
# Or use cert‑manager via Helm
helm upgrade cert-manager jetstack/cert-manager --set installCRDs=true3.2 Binary‑level deployment (large‑scale, custom kernel)
Generate a custom CA and API server certificates with cfssl to include LB IP and DNS names:
# Generate CA
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
# Generate API server cert
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json \
-hostname=10.0.0.100,k8s-api.example.com,kubernetes.default.svc \
apiserver-csr.json | cfssljson -bare apiserver3.3 Network plugin performance comparison
Real‑world benchmark: a gaming company switched from Calico to Cilium and reduced network latency by ~40%.
Avoid mixing multiple CNI plugins in the same cluster. When using Cilium, disable kube‑proxy:
helm install cilium cilium/cilium --namespace=kube-system \
--set kubeProxyReplacement=strict4. Security Hardening
4.1 Authentication – Three Locks
Lock 1 – Disable anonymous access:
# /etc/kubernetes/manifests/kube-apiserver.yaml
- --anonymous-auth=falseLock 2 – Fine‑grained RBAC: Example role granting read‑only access to pods and services in the payment namespace.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: payment
name: payment-service-role
rules:
- apiGroups: [""]
resources: ["pods","services"]
verbs: ["get","list"]Lock 3 – Audit logging:
# audit-policy.yaml
rules:
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
verbs: ["delete","patch"]4.2 Runtime Security
Pod Security Admission (PSA) replaces the deprecated PSP (K8s 1.25+). Enforce restricted profile per namespace:
apiVersion: v1
kind: Namespace
metadata:
name: untrusted
labels:
pod-security.kubernetes.io/enforce: restrictedImage signature verification with cosign:
cosign verify --key cosign.pub your-registry/app:v15. Observability
5.1 Monitoring – Core Metrics
Key Prometheus recording rule for 99th‑percentile API server request latency:
groups:
- name: k8s.rules
rules:
- record: cluster:apiserver_request_latency:percentile99
expr: histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le))5.2 Logging – EFK Optimisation
Fluent Bit multi‑threaded pipeline configuration to avoid log backlog:
[SERVICE]
Flush 5
Daemon Off
Log_Level info
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
storage.path /var/log/flb-storage/Separate hot and cold indices in Elasticsearch, moving old indices to low‑cost storage.
6. Backup, Disaster Recovery & Chaos Engineering
6.1 Backup Strategy – 3‑2‑1 Rule
Three copies: local, cross‑region, offline (e.g., tape).
Two forms: Velero for Kubernetes resources + PV snapshots, and direct etcd snapshots.
Goal: restore within one hour; regularly rehearse RTO.
6.2 Chaos Engineering
Example Chaos Mesh experiment that kills a CoreDNS pod to test DNS resilience:
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
name: kill-core-dns
spec:
action: pod-kill
mode: one
selector:
labelSelectors:
"k8s-app": "kube-dns"
gracePeriod: 0 # immediate kill7. Upgrade & Maintenance
7.1 Rolling Upgrade Best Practices
Check for deprecated APIs (e.g., kubectl convert).
Upgrade masters first, then workers.
Never skip minor versions; upgrade 1.24 → 1.25 → 1.26.
Keep old etcd snapshots and binaries for rollback.
Use Velero to back up critical namespaces before upgrade.
7.2 Daily Operations – Hidden Issues
Detect resource leaks with kube-score: kube-score score deployment.yaml Configure image garbage collection thresholds in kubelet:
# /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80Building a production‑grade Kubernetes cluster is akin to constructing a deep‑sea vessel: the design must anticipate storms, the hull must resist hidden reefs, and continuous iteration ensures lasting stability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
