LightPool: An NVMe‑oF‑Based High‑Performance and Lightweight Storage Pool Architecture for Cloud‑Native Distributed Databases
The article presents LightPool, a cloud‑native storage‑pooling solution that leverages NVMe‑over‑Fabric, Kubernetes‑based scheduling, and a lightweight user‑space engine to deliver high‑performance, low‑cost, and highly available storage for large‑scale distributed databases while eliminating traditional bottlenecks.
Paper Background
The paper was selected for the 30th IEEE International Symposium on High‑Performance Computer Architecture (HPCA) and describes a novel storage‑pool architecture called LightPool, designed by Alibaba Cloud server R&D and Ant Group data‑infrastructure teams.
Paper Interpretation
Facing performance, cost, and stability pressures in cloud‑native database deployments, the authors propose a cloud‑native local storage‑pooling design that matches local‑storage performance while providing elasticity and reducing storage costs.
Distributed Database Storage Choices
Three traditional architectures are discussed: compute‑storage co‑location, compute‑storage separation (ECS + EBS/S3), and shared‑storage solutions. Each has trade‑offs in performance, cost, and resource fragmentation.
The authors’ goal is to eliminate resource fragmentation and improve utilization without sacrificing stability or performance.
LightPool Architecture
LightPool combines cloud‑native design (Kubernetes‑based scheduling, CSI interface, container deployment), a high‑performance lightweight storage engine (zero‑copy local mount, multi‑media support), and NVMe‑over‑Fabric (TCP, RDMA) on an overlay network.
The cluster consists of control nodes (managing SSD pool allocation via Kubernetes) and worker nodes (running containers, storage engine, and CSI plugin). Storage is decoupled from compute scheduling, allowing CPU/Memory‑centric pod placement while disks are attached elastically.
Scheduling Design
Controller and Agent components manage storage resources, report health via Kubernetes Lease objects, and implement a scheduling flow similar to pod scheduling: filtering (basic, affinity) and priority scoring, with optional custom filters.
Two approaches ensure correct local‑disk scheduling: extending K8s node resources or integrating with the Scheduler Framework for pre‑allocation.
Storage Engine Design
The engine uses a user‑space lightweight design, a custom zero‑copy local storage protocol (bypassing TCP), and supports multiple media types, snapshots, RAID, and hybrid SSD deployments to reduce cost.
High‑Availability Design
LightPool implements hot‑upgrade (sub‑second interruption) and hot‑migration (seamless data movement between nodes) to keep services always online, even with single‑replica deployments.
Future Outlook
The authors highlight ongoing research achievements in high‑performance architecture conferences and outline future work on CXL‑based computing architectures to meet growing AI and large‑memory workloads.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.