Cloud Native 12 min read

LightPool: A NVMe‑oF‑Based High‑Performance and Lightweight Storage Pool Architecture for Cloud‑Native Distributed Databases

The article introduces LightPool, an open‑source NVMe‑over‑Fabric storage pool system presented at HPCA 2024, which combines cloud‑native design, high‑performance lightweight storage engine, and advanced scheduling to improve resource efficiency, cost, and availability for large‑scale distributed databases.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
LightPool: A NVMe‑oF‑Based High‑Performance and Lightweight Storage Pool Architecture for Cloud‑Native Distributed Databases

In March 2024, the 30th IEEE International Symposium on High‑Performance Computer Architecture (HPCA) in Edinburgh accepted a paper titled "LightPool: A NVMe‑oF‑Based High‑Performance and Lightweight Storage Pool Architecture for Cloud‑Native Distributed Database" authored by Alibaba Cloud Server R&D and Ant Data Infrastructure teams.

The paper addresses the challenges faced by distributed databases under cloud‑native workloads, namely performance, cost, and stability pressures, and proposes a novel cloud‑native local storage pooling architecture that matches the performance of local storage while providing elasticity and reducing storage costs.

LightPool’s architecture consists of control nodes and worker nodes. Control nodes manage SSD pools, handle allocation and reclamation, and integrate with Kubernetes (k8s) via a CSI plugin, while worker nodes run containers and the LightPool storage engine, enabling elastic attachment of storage resources through NVMe‑over‑Fabric.

The scheduling framework mirrors k8s pod scheduling: it pre‑loads cluster resources, applies filter plugins (basic and affinity filters), and uses priority scoring to select optimal nodes, supporting custom filters and two strategies for local disk scheduling—resource‑extension updates and Scheduler Framework integration.

The storage engine is user‑space, lightweight, and supports zero‑copy local storage protocols, multiple storage media (including QLC SSDs and ZNS), and features such as snapshots and RAID, achieving high throughput and low CPU overhead.

High‑availability mechanisms include hot‑upgrade (sub‑second upgrade time) and hot‑migration, allowing seamless storage pool rebalancing and node failure recovery without service interruption.

LightPool is released as the open‑source project LiteIO (https://github.com/eosphoros-ai/liteio), and the paper’s full text is available through the HPCA proceedings.

high performancecloud-native storageDistributed DatabasesLiteIONVMe-oFHPCAstorage pooling
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.