Cloud Native 14 min read

Detailed Overview of LiteIO Architecture, Components, and Volume Lifecycle

This article provides a comprehensive technical overview of LiteIO, describing its core and CSI components, their interactions, the complete volume lifecycle within Kubernetes, common implementation pitfalls, and configuration examples for storage pools and agents.

AntData

Feb 22, 2024

Detailed Overview of LiteIO Architecture, Components, and Volume Lifecycle

Building on previous introductions of LiteIO's features and deployment methods, this article delves into the specific functions and collaboration principles of each LiteIO component and explains the lifecycle of a volume within the LiteIO system.

The architecture is divided into two functional groups: core functionality (node-disk-controller, disk-agent, nvmf_tgt) handling resource scheduling and lifecycle management, and CSI functionality (csi-controller, csi-node) that maps core resources to Kubernetes CSI objects such as PersistentVolume and PersistentVolumeClaim.

From a user perspective, the volume lifecycle proceeds through ten steps: (1) pod scheduling, (2) PVC binding, (3) csi‑controller creating a PV, (4) node‑disk‑controller scheduling the new volume, (5) disk‑agent creating the volume on the selected node, (6) kubelet mounting the volume (including remote NVMe connection if needed), (7) pod deletion triggering unmount, (8) csi‑controller deleting the volume, (9) disk‑agent reclaiming storage, and (10) node‑disk‑controller cleaning up in‑memory references.

Disk‑agent, deployed as a DaemonSet on each storage node, is responsible for discovering or constructing StoragePools, reporting their status, maintaining heartbeats, managing volumes, snapshots, migration tasks, and exposing IO metrics.

Common pitfalls: Error 1 – early implementations used direct RPC calls from node‑disk‑controller to the agent instead of watching CRDs, causing tight coupling, duplicated logic, and network isolation issues. The current design lets both controller and agent interact solely via the API server and CRD state. Error 2 – maintaining a heartbeat is essential; a bug in the heartbeat implementation caused agents to stop updating their Lease objects because the code assumed a nil Lease on error. The problematic snippet is shown below.

func (hs *HeartbeatService) doHeartbeat() (err error) {
    // If hs.lease is nil, get or create it first
    if hs.lease == nil {
        hs.lease, err = k8sLeaseCli.Get(leaseName)
        if err != nil {
            if errors.IsNotFound(err) {
                hs.lease = &coordv1.Lease{...}
                hs.lease, err = k8sLeaseCli.Create(hs.lease)
                if err != nil {
                    hs.lease = nil
                    klog.Error(err)
                }
                return
            } else {
                klog.Error(err)
                return
            }
        }
    }
    // update renew time
    hs.lease.Spec.RenewTime = &renew
    hs.lease, err = k8sLeaseCli.Update(hs.lease)
    if err != nil {
        klog.Error(err)
        return
    }
    return
}

The fix is to reset hs.lease = nil after a failed Update or to re‑fetch the latest Lease before the next heartbeat.

StoragePool construction can be configured for LVM or SPDK back‑ends. Example configuration for an SPDK‑based pool using an AIO bdev and LVS is provided:

storage:
  pooling:
    name: aio-lvs
    mode: SpdkLVStore
  bdev:
    type: aioBdev
    name: test-aio-bdev
    size: 1048576000 # 1GiB
    filePath: /local-storage/aio-lvs

Node‑disk‑controller, a centrally deployed component, maintains global resource state, performs volume scheduling, and coordinates migration and snapshot tasks. Metadata synchronization to an external database is implemented via a PlugableReconciler pattern, where each resource reconciler can attach Syncer plugins, and specialized plugins (e.g., LockPoolPlugin) handle resource‑specific logic.

The built‑in scheduler loads cluster resources into memory via EventHandlers, then filters nodes based on pool status, remaining capacity, and PositionAdvice, followed by priority scoring. Custom filters and priorities can be configured, and two approaches are offered for guaranteeing local‑disk scheduling correctness.

CSI components (csi‑controller and csi‑node) implement Create/Delete/Expand Volume and Snapshot APIs, with csi‑node handling StageVolume, PublishVolume, UnpublishVolume, and UnstageVolume operations. Metrics for volume space and inode usage are exposed to Prometheus, and a CLI tool is provided for manual CSI‑node operations.

The article concludes with an invitation to join the LiteIO open‑source community, links to the GitHub repository, and references to previous related posts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Kubernetes CSI LiteIO Volume Lifecycle

Written by

AntData

Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.