Cloud Native 13 min read

LightPool: An NVMe‑oF‑Based High‑Performance and Lightweight Storage Pool Architecture for Cloud‑Native Distributed Databases

The article presents LightPool, a cloud‑native storage‑pooling solution that leverages NVMe‑over‑Fabric, Kubernetes‑based scheduling, and a lightweight user‑space engine to deliver high‑performance, low‑cost, and highly available storage for large‑scale distributed databases while eliminating traditional bottlenecks.

Alibaba Cloud Infrastructure

Mar 21, 2024

LightPool: An NVMe‑oF‑Based High‑Performance and Lightweight Storage Pool Architecture for Cloud‑Native Distributed Databases

Paper Background

The paper was selected for the 30th IEEE International Symposium on High‑Performance Computer Architecture (HPCA) and describes a novel storage‑pool architecture called LightPool, designed by Alibaba Cloud server R&D and Ant Group data‑infrastructure teams.

Paper Interpretation

Facing performance, cost, and stability pressures in cloud‑native database deployments, the authors propose a cloud‑native local storage‑pooling design that matches local‑storage performance while providing elasticity and reducing storage costs.

Distributed Database Storage Choices

Three traditional architectures are discussed: compute‑storage co‑location, compute‑storage separation (ECS + EBS/S3), and shared‑storage solutions. Each has trade‑offs in performance, cost, and resource fragmentation.

The authors’ goal is to eliminate resource fragmentation and improve utilization without sacrificing stability or performance.

LightPool Architecture

LightPool combines cloud‑native design (Kubernetes‑based scheduling, CSI interface, container deployment), a high‑performance lightweight storage engine (zero‑copy local mount, multi‑media support), and NVMe‑over‑Fabric (TCP, RDMA) on an overlay network.

The cluster consists of control nodes (managing SSD pool allocation via Kubernetes) and worker nodes (running containers, storage engine, and CSI plugin). Storage is decoupled from compute scheduling, allowing CPU/Memory‑centric pod placement while disks are attached elastically.

Scheduling Design

Controller and Agent components manage storage resources, report health via Kubernetes Lease objects, and implement a scheduling flow similar to pod scheduling: filtering (basic, affinity) and priority scoring, with optional custom filters.

Two approaches ensure correct local‑disk scheduling: extending K8s node resources or integrating with the Scheduler Framework for pre‑allocation.

Storage Engine Design

The engine uses a user‑space lightweight design, a custom zero‑copy local storage protocol (bypassing TCP), and supports multiple media types, snapshots, RAID, and hybrid SSD deployments to reduce cost.

High‑Availability Design

LightPool implements hot‑upgrade (sub‑second interruption) and hot‑migration (seamless data movement between nodes) to keep services always online, even with single‑replica deployments.

Future Outlook

The authors highlight ongoing research achievements in high‑performance architecture conferences and outline future work on CXL‑based computing architectures to meet growing AI and large‑memory workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native kubernetes storage distributed databases NVMe-oF high-performance

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.