Mastering Kubernetes Storage: PV/PVC, Controllers, FlexVolume & CSI Explained
This article provides a comprehensive guide to Kubernetes storage architecture, covering persistent volumes, claims, the roles of PV, AD, and Volume controllers, the FlexVolume plugin system, and the CSI framework with deployment and usage examples.
Introduction
Kubernetes storage is the foundation for stateful services, offering data persistence through volumes. The platform supports built‑in in‑tree volume plugins and an out‑of‑tree plugin mechanism that allows external storage solutions to integrate.
Mounting a Volume – Example Walkthrough
A StatefulSet YAML defines a PVC named disk-pvc with a storageClassName. The mounting process consists of six steps:
User creates a Pod that references the PVC.
PV Controller watches for unbound PVCs and attempts to bind them to suitable PVs, provisioning a new PV if none match.
Scheduler assigns the Pod to a Node based on selectors, affinities, and volume‑related predicates.
If the PV is not yet attached, the AD Controller invokes the Volume Plugin to attach the remote volume to the node device (e.g., /dev/vdb).
Volume Manager mounts the device, formats it if needed, and makes it available under a global path.
The mount is finally bound into the container’s filesystem.
Kubernetes Storage Architecture
PV Controller : Manages PV/PVC lifecycle, binding, provisioning, and deletion.
AD Controller : Handles attach/detach operations, maintaining DesiredStateOfWorld and ActualStateOfWorld.
Volume Manager : Executes mount/unmount, formatting, and global path handling on each node.
Volume Plugins : Provide concrete implementations for provision, attach, mount, etc., and are divided into In‑Tree (bundled with Kubernetes) and Out‑of‑Tree (e.g., FlexVolume, CSI).
PV Controller Implementation
The controller runs two workers:
ClaimWorker : Drives PVC state transitions using the pv.kubernetes.io/bind-completed label.
VolumeWorker : Manages PV state based on the presence of a ClaimRef and the PV’s ReclaimPolicy.
Binding follows a sequence of checks: VolumeMode, LabelSelector, StorageClassName, AccessMode, and Size.
AD Controller Details
It maintains two core objects:
DesiredStateOfWorld : Desired mount state for each volume.
ActualStateOfWorld : Current mount state observed in the cluster.
Two main loops run: desiredStateOfWorldPopulator syncs new PVCs/Pods into DesiredStateOfWorld. Reconcile compares Desired and Actual states, invoking attach/detach via Volume Plugins.
Volume Manager Mechanics
Operates similarly to the AD Controller but runs inside the Kubelet. It decides whether to perform attach/detach based on the --enable-controller-attach-detach flag.
Volume Plugins Management
Plugins are discovered via a filesystem watcher (e.g., /usr/libexec/kubernetes/kubelet-plugins/volume/exec/). The InitPlugins routine loads In‑Tree plugins and registers a Prober that watches for new plugin binaries, updating the plugin list dynamically.
FlexVolume Overview
FlexVolume is an out‑of‑tree plugin model that proxies calls to external executables. It implements interfaces such as init, GetVolumeName, Attach, WaitForAttach, MountDevice, Setup, TearDown, Detach, ExpandVolumeDevice, and NodeExpand. Unimplemented interfaces return a JSON error like:
{
"status": "Not supported",
"message": "error message"
}FlexVolume plugins reside under /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ and communicate with Kubelet via standard input/output.
FlexVolume Mount Flow
The process includes:
Attach – remote API creates the storage device on the node.
MountDevice – formats and mounts the device to a global path.
Setup – binds the global path into the Pod’s filesystem.
File‑based volumes skip the Attach and MountDevice steps and only perform Setup/Teardown.
FlexVolume Code Example
A typical script parses the command‑line argument to dispatch to init, doMount, or doUnmount functions.
FlexVolume Usage
A FlexVolume PV template specifies driver, fsType, and options. Labels and nodeAffinity can be used for scheduling constraints.
CSI Introduction
CSI (Container Storage Interface) provides a vendor‑agnostic, container‑native storage plugin model. It consists of a Controller Server (handling Create, Delete, Attach, Detach) and a Node Server (handling NodeStageVolume, NodePublishVolume, etc.). Communication occurs over Unix sockets.
CSI System Structure
Controller Server : Implements CSI controller RPCs and works with external components such as Provisioner, Attacher, Resizer, Snapshotter.
Node Server : Runs as a DaemonSet on each node, handling mount/unmount via the Kubelet VolumeManager.
Node‑Driver‑Registrar : Registers CSI drivers with the kubelet, watching a directory for socket files and updating node annotations/labels.
CSI Objects
VolumeAttachment : Tracks the attachment state of a volume to a node.
CSIDriver : Describes driver capabilities (e.g., attachRequired, podInfoOnMount).
CSINode : Lists drivers installed on a node.
Node‑Driver‑Registrar Workflow
Plugin places a socket file in /var/lib/kubelet/plugins_registry.
Kubelet discovers the socket, calls GetPluginInfo on the CSI plugin.
Kubelet invokes NodeGetInfo to obtain driver details.
Kubelet updates node annotations/labels and creates a CSINode object.
External‑Attacher
Monitors VolumeAttachment objects; when the attachment state is false, it calls the CSI ControllerPublishVolume (attach) or ControllerUnpublishVolume (detach) RPCs.
CSI Deployment
The Controller Server runs as a Deployment (often with two replicas for HA). The Node Server runs as a DaemonSet on every node. The Node‑Driver‑Registrar runs as a sidecar container on each node to register the driver.
CSI Usage Example
A CSI PV definition includes driver, volumeHandle, volumeAttributes, and optional nodeAffinity. After deployment, a Pod can see the volume device (e.g., /dev/vdb) mounted at /data through the global and pod paths.
Additional CSI Features
Support for Secrets at different stages (provision, mount, etc.).
Topology‑aware scheduling via nodeAffinity.
Block‑mode volumes via volumeMode: Block.
Flags such as skipAttach and podInfoOnMount to fine‑tune driver behavior.
Recent CSI Enhancements
ExpandCSIVolumes – filesystem expansion.
VolumeSnapshotDataSource – snapshot support.
CSIInlineVolume – allows defining CSI volumes directly in a Pod spec.
Conclusion
The article covered three major areas: Kubernetes storage architecture (PV, PVC, controllers), FlexVolume plugin mechanics and usage, and CSI framework components, deployment, and advanced features. Understanding these concepts helps developers design, implement, and troubleshoot stateful workloads on Kubernetes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
