Detailed Overview of LiteIO Architecture, Components, and Volume Lifecycle
This article provides a comprehensive technical overview of LiteIO, describing its core and CSI components, their interactions, the complete volume lifecycle within Kubernetes, common implementation pitfalls, and configuration examples for storage pools and agents.
Building on previous introductions of LiteIO's features and deployment methods, this article delves into the specific functions and collaboration principles of each LiteIO component and explains the lifecycle of a volume within the LiteIO system.
The architecture is divided into two functional groups: core functionality (node-disk-controller, disk-agent, nvmf_tgt) handling resource scheduling and lifecycle management, and CSI functionality (csi-controller, csi-node) that maps core resources to Kubernetes CSI objects such as PersistentVolume and PersistentVolumeClaim.
From a user perspective, the volume lifecycle proceeds through ten steps: (1) pod scheduling, (2) PVC binding, (3) csi‑controller creating a PV, (4) node‑disk‑controller scheduling the new volume, (5) disk‑agent creating the volume on the selected node, (6) kubelet mounting the volume (including remote NVMe connection if needed), (7) pod deletion triggering unmount, (8) csi‑controller deleting the volume, (9) disk‑agent reclaiming storage, and (10) node‑disk‑controller cleaning up in‑memory references.
Disk‑agent, deployed as a DaemonSet on each storage node, is responsible for discovering or constructing StoragePools, reporting their status, maintaining heartbeats, managing volumes, snapshots, migration tasks, and exposing IO metrics.
Common pitfalls: Error 1 – early implementations used direct RPC calls from node‑disk‑controller to the agent instead of watching CRDs, causing tight coupling, duplicated logic, and network isolation issues. The current design lets both controller and agent interact solely via the API server and CRD state. Error 2 – maintaining a heartbeat is essential; a bug in the heartbeat implementation caused agents to stop updating their Lease objects because the code assumed a nil Lease on error. The problematic snippet is shown below.
func (hs *HeartbeatService) doHeartbeat() (err error) {
// If hs.lease is nil, get or create it first
if hs.lease == nil {
hs.lease, err = k8sLeaseCli.Get(leaseName)
if err != nil {
if errors.IsNotFound(err) {
hs.lease = &coordv1.Lease{...}
hs.lease, err = k8sLeaseCli.Create(hs.lease)
if err != nil {
hs.lease = nil
klog.Error(err)
}
return
} else {
klog.Error(err)
return
}
}
}
// update renew time
hs.lease.Spec.RenewTime = &renew
hs.lease, err = k8sLeaseCli.Update(hs.lease)
if err != nil {
klog.Error(err)
return
}
return
}The fix is to reset hs.lease = nil after a failed Update or to re‑fetch the latest Lease before the next heartbeat.
StoragePool construction can be configured for LVM or SPDK back‑ends. Example configuration for an SPDK‑based pool using an AIO bdev and LVS is provided:
storage:
pooling:
name: aio-lvs
mode: SpdkLVStore
bdev:
type: aioBdev
name: test-aio-bdev
size: 1048576000 # 1GiB
filePath: /local-storage/aio-lvsNode‑disk‑controller, a centrally deployed component, maintains global resource state, performs volume scheduling, and coordinates migration and snapshot tasks. Metadata synchronization to an external database is implemented via a PlugableReconciler pattern, where each resource reconciler can attach Syncer plugins, and specialized plugins (e.g., LockPoolPlugin) handle resource‑specific logic.
The built‑in scheduler loads cluster resources into memory via EventHandlers, then filters nodes based on pool status, remaining capacity, and PositionAdvice, followed by priority scoring. Custom filters and priorities can be configured, and two approaches are offered for guaranteeing local‑disk scheduling correctness.
CSI components (csi‑controller and csi‑node) implement Create/Delete/Expand Volume and Snapshot APIs, with csi‑node handling StageVolume, PublishVolume, UnpublishVolume, and UnstageVolume operations. Metrics for volume space and inode usage are exposed to Prometheus, and a CLI tool is provided for manual CSI‑node operations.
The article concludes with an invitation to join the LiteIO open‑source community, links to the GitHub repository, and references to previous related posts.
AntData
Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.