Cloud Native 12 min read

How to Achieve Zero‑Downtime Deployments with Kubernetes

Learn how to configure Kubernetes for zero‑downtime applications by syncing container images, ensuring multiple pod replicas, using PodDisruptionBudgets, selecting appropriate deployment strategies, setting up liveness/readiness probes, handling graceful termination, applying pod anti‑affinity, and enabling autoscaling and proper resource limits.

MaGe Linux Operations

Aug 31, 2023

How to Achieve Zero‑Downtime Deployments with Kubernetes

I have worked with both local and hosted Kubernetes clusters for over seven years, and containers have completely changed the hosting landscape, offering features like rolling restarts, zero downtime, and health checks that previously required complex setups.

Container Image Location

While pulling images is easy with Docker, in production you often do not want to rely on an uncontrolled remote registry. Risks include registry disappearance, deleted tags, mutable tags causing inconsistent behavior, and security compliance requirements.

The common solution is to sync container images from the source registry to your own private registry.

Pod Count (Application Instances)

For high availability you need at least two Kubernetes replicas (two Pods). Example deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 2 # tells deployment to run 2 pods matching the template
  template:
    ...

Common misconception: a rolling update does not eliminate the need for multiple instances during node failures, scaling events, or when pods receive SIGTERM.

Pod Disruption Budget

A PodDisruptionBudget (PDB) limits the number of unavailable Pods during maintenance, ensuring the application stays available.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: my-app

Deployment Strategies

Kubernetes supports two deployment strategies:

RollingUpdate (default): updates pods gradually.

Recreate: shuts down all pods before starting new ones.

When heavy traffic loads require controlled rollout speed, you can tune maxUnavailable and maxSurge percentages.

Automatic Rollback

Kubernetes does not provide automatic rollback out of the box; you need third‑party tools like Helm, ArgoCD, or Spinnaker. Helm offers flags such as --wait, --wait-for-jobs, and --atomic to help.

Properly configured probes ensure that a failing pod triggers a rollback.

Probes

Liveness probes determine if a pod is alive; if they fail, the pod is restarted. Readiness probes control whether traffic is sent to a pod. Custom application‑level probes are often more reliable than simple TCP checks.

Initial Startup Delay

Applications that take longer to start (e.g., Java, heavy initialization, database schema loading) may need an increased initialDelaySeconds in their liveness probe.

livenessProbe:
  initialDelaySeconds: 60
  httpGet:
    ...

Graceful Termination (terminationGracePeriodSeconds)

Graceful termination only works if the application handles SIGTERM. Without it, pods are killed abruptly, leading to poor user experience, data loss, or unrecoverable state. The default is 30 seconds, but you can extend it as needed.

Pod Anti‑Affinity

Pod anti‑affinity prevents multiple instances of the same application from running on the same node, reducing risk of node‑level outages. It can be soft (preferred) or hard (required).

affinity:
  podAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: security
          operator: In
          values:
          - S1
      topologyKey: topology.kubernetes.io/zone
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - S2
        topologyKey: topology.kubernetes.io/zone

Resources

Insufficient memory leads to OOM kills; insufficient CPU can cause slow responses or failed readiness checks. Proper limits and requests are essential.

Autoscaling

Horizontal Pod Autoscaling (HPA) adds pods when CPU usage exceeds a threshold, helping avoid downtime under load.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Conclusion

Kubernetes can deliver magical reliability, but only when applications are truly cloud‑native and correctly configured. Key takeaways:

Run at least two instances.

Add health checks (probes).

Handle SIGTERM gracefully.

Configure autoscaling.

Allocate sufficient resources.

Use pod anti‑affinity.

Add a PodDisruptionBudget.

When everything is set up properly, the Kubernetes experience is seamless and downtime‑free.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kubernetes autoscaling Zero Downtime Probes pod disruption budget

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.