How Kubernetes Manages Storage Snapshots and Topology‑Aware Scheduling
This article explains the background of storage snapshots in Kubernetes, the VolumeSnapshot and VolumeSnapshotClass APIs, the concept of storage topology, how delayed binding and topology‑aware scheduling solve common PV placement problems, and provides step‑by‑step YAML examples and a live demo of the whole workflow.
Basic Knowledge
Kubernetes uses PersistentVolume (PV) and PersistentVolumeClaim (PVC) to abstract storage. To improve data reliability and enable fast copy or migration, snapshots are created via the CSI Snapshotter controller, and restored using the same mechanism.
Storage Snapshot Background
Snapshots provide a point‑in‑time copy of a volume. Users declare a VolumeSnapshot object that references a VolumeSnapshotClass, which in turn specifies the CSI driver that actually creates the snapshot.
Storage Snapshot User Interface – Snapshot
Similar to PVC/PV, a VolumeSnapshot is created by specifying the source PVC. The controller generates a VolumeSnapshotContent that stores the snapshot ID returned by the storage provider.
Storage Snapshot User Interface – Restore
To restore, a new PVC is created with its dataSource field pointing to the previously created VolumeSnapshot. The PVC controller then creates a new PV whose data originates from the snapshot.
Topology – Meaning
Topology describes the logical location of nodes in a cluster (region, zone, hostname). Labels such as failure-domain.beta.kubernetes.io/region, failure-domain.beta.kubernetes.io/zone, and kubernetes.io/hostname are used to express these domains.
Storage Topology Scheduling Background
When a PV has nodeAffinity restrictions (e.g., Local PV or zone‑restricted cloud disks), the scheduler must ensure that the pod using the PVC is placed on a node that satisfies those restrictions. Without this, a pod may be scheduled to a node that cannot access the bound PV, causing failures.
Why Access‑Location Restrictions Exist
Pod creation and PV creation happen in parallel, so the scheduler cannot know in advance which node will host the pod. This leads to two classic problems:
Local PV: the PV is tied to a specific node; if the pod is scheduled elsewhere, it cannot access the storage.
Zone‑restricted cloud disks: a disk created in one availability zone cannot be used by a pod scheduled in a different zone.
Solution – Delayed Binding and Topology‑Aware Scheduling
Kubernetes delays the binding of PVC to PV until after the pod is scheduled. The scheduler then considers both compute resources and storage topology when selecting a node. This requires three components to support delayed binding:
PV controller with delayed binding support.
Dynamic provisioner that can create PVs after the node is known.
kube‑scheduler extended to evaluate storage topology during pre‑filter and filter phases.
Processing Flow
Snapshot Flow : User creates a VolumeSnapshot → CSI snapshot controller watches it → CSI driver creates the snapshot on the storage backend → controller writes a VolumeSnapshotContent with the snapshot ID and binds the snapshot.
Restore Flow : User creates a PVC with dataSource set to the snapshot → CSI provisioner watches the PVC → provisioner creates a new volume from the snapshot ID → a new PV is created and bound to the PVC → pod can read restored data.
Topology‑Aware Scheduling Flow :
Scheduler receives a pod and performs the usual pre‑filter (resource matching).
It then checks each candidate node against the nodeAffinity of already bound PVCs and against the allowedTopologies of StorageClasses for PVCs that require delayed binding.
Nodes that satisfy both compute and storage constraints are scored and the best node is selected.
After node selection, the scheduler updates the PVC/PV objects to trigger either delayed binding or dynamic provisioning.
Demo Walkthrough
The author demonstrates the workflow on a three‑node cluster (one master, two workers). After installing the CSI snapshot plugin ( csi‑external‑snapshot) and the CSI disk plugin, the following steps are performed:
Create a StorageClass with bindingMode: WaitForFirstConsumer and, for dynamic provisioning, an allowedTopologies field limiting the zone.
Create a PVC that triggers delayed binding; it stays in Pending until a pod uses it.
Deploy a pod that uses the PVC; the scheduler picks a node whose labels match the topology constraints, causing the PVC to bind and the PV to be created.
Verify that the pod becomes Running after the topology match is corrected (changing the allowed zone from cn‑hangzhou‑d to cn‑hangzhou‑b).
Create a VolumeSnapshotClass and a VolumeSnapshot for the PVC, then inspect the generated VolumeSnapshotContent to see the snapshot ID.
Delete the snapshot and observe the cleanup of the associated content.
Throughout the demo, screenshots of the kubectl output and the node label inspection are shown to illustrate why the pod initially remained pending and how correcting the topology constraints resolves the issue.
Conclusion
The article covers three main points: (1) the Kubernetes resources and usage patterns for storage snapshots, (2) the necessity of storage topology‑aware scheduling illustrated by real‑world failure scenarios, and (3) an in‑depth analysis of the internal mechanisms that make delayed binding and topology‑aware scheduling work.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
