Cloud Native 11 min read

Mastering Kubernetes StatefulSet: Architecture, Access, and Lifecycle Management

This article explains Kubernetes StatefulSet fundamentals, its headless service networking, access patterns, creation workflow, controller mechanics, and detailed procedures for updating, scaling, and deleting stateful pods with illustrative code examples.

MaGe Linux Operations

Feb 15, 2024

Mastering Kubernetes StatefulSet: Architecture, Access, and Lifecycle Management

1. Introduction to StatefulSet

StatefulSet is a workload object designed to manage stateful applications in Kubernetes. It controls a set of Pods with identical container specifications, providing each Pod with a stable, persistent identifier and storage. Unlike Deployments, each Pod retains a sticky ID throughout its lifecycle, enabling ordered deployment, scaling, and termination strategies required by stateful workloads.

StatefulSet pods use a Headless Service to define network identities, generating resolvable DNS records for intra‑StatefulSet communication.

2. Access Methods for StatefulSet Workloads

Access is similar to other workloads such as Deployments, but StatefulSets often rely on a Headless Service, which lacks a cluster IP and therefore does not create iptables/ipvs rules in kube‑proxy. Clients discover backend instances directly via DNS.

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # must match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 3 # default 1
  minReadySeconds: 10 # default 0
  template:
    metadata:
      labels:
        app: nginx # must match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: registry.k8s.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "my-storage-class"
      resources:
        requests:
          storage: 1Gi

3. Creation Process of a StatefulSet

User issues a kubectl command to create a StatefulSet resource.

The API server authenticates and authorizes the request, then writes the object to etcd.

The StatefulSet controller watches etcd via a non‑blocking long‑running connection; any change triggers a fetch from the API server.

The API server returns the updated StatefulSet object to the controller.

The controller reconciles the desired replica count, creating Pods sequentially (0, 1, … N‑1) according to the template.

After each Pod creation, the API server updates etcd with the new Pod status.

4. StatefulSet Controller Working Principle

The controller relies on two key components:

Informer : watches the Kubernetes API for resource changes and updates a local cache.

Event Handler : callback that reacts to informer events and performs the necessary actions.

Manage Revision : tracks each update version of the StatefulSet for rollback.

Ordered Pod Management : ensures Pods start, update, and terminate in a defined order.

Replica Arrays : replicas holds IDs of healthy Pods; condemned holds IDs of Pods slated for removal.

4.1 Updating Pods

Determine the update strategy (OnDelete or RollingUpdate).

If OnDelete, manual Pod deletion triggers recreation.

If RollingUpdate, Pods are updated sequentially, waiting for each to terminate before proceeding.

After each update, verify Pod labels match the StatefulSet; mismatched Pods are deleted and recreated.

4.2 Scaling Pods

Scaling up:

Update the StatefulSet status; new Pods are added to the replicas queue.

Ensure existing Pods are Running or Ready; replace any failed Pods.

Validate label consistency after creation.

Scaling down:

Update the StatefulSet status.

Process the condemned queue, ensuring Pods are not in a terminating state; delete Pods in reverse order of their ordinal IDs.

Validate label consistency after deletion.

4.3 Deleting Pods

The core deletion logic is implemented in the processCondemned function:

func (ssc *defaultStatefulSetControl) processCondemned(ctx context.Context, set *apps.StatefulSet, firstUnhealthyPod *v1.Pod, monotonic bool, condemned []*v1.Pod, i int) (bool, error) {
    logger := klog.FromContext(ctx)
    if isTerminating(condemned[i]) {
        if monotonic {
            logger.V(4).Info("StatefulSet is waiting for Pod to Terminate prior to scale down",
                "statefulSet", klog.KObj(set), "pod", klog.KObj(condemned[i]))
            return true, nil
        }
        return false, nil
    }
    if !isRunningAndReady(condemned[i]) && monotonic && condemned[i] != firstUnhealthyPod {
        logger.V(4).Info("StatefulSet is waiting for Pod to be Running and Ready prior to scale down",
            "statefulSet", klog.KObj(set), "pod", klog.KObj(firstUnhealthyPod))
        return true, nil
    }
    if !isRunningAndAvailable(condemned[i], set.Spec.MinReadySeconds) && monotonic && condemned[i] != firstUnhealthyPod {
        logger.V(4).Info("StatefulSet is waiting for Pod to be Available prior to scale down",
            "statefulSet", klog.KObj(set), "pod", klog.KObj(firstUnhealthyPod))
        return true, nil
    }
    logger.V(2).Info("Pod of StatefulSet is terminating for scale down",
        "statefulSet", klog.KObj(set), "pod", klog.KObj(condemned[i]))
    return true, ssc.podControl.DeleteStatefulPod(set, condemned[i])
}

This function checks termination status, running/ready conditions, and availability before safely deleting a Pod, respecting the ordered or parallel management strategy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes scaling statefulset pod management Headless Service

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.