LightPool: A NVMe‑oF‑Based High‑Performance and Lightweight Storage Pool Architecture for Cloud‑Native Distributed Databases
The article introduces LightPool, an open‑source NVMe‑over‑Fabric storage pool system presented at HPCA 2024, which combines cloud‑native design, high‑performance lightweight storage engine, and advanced scheduling to improve resource efficiency, cost, and availability for large‑scale distributed databases.
In March 2024, the 30th IEEE International Symposium on High‑Performance Computer Architecture (HPCA) in Edinburgh accepted a paper titled "LightPool: A NVMe‑oF‑Based High‑Performance and Lightweight Storage Pool Architecture for Cloud‑Native Distributed Database" authored by Alibaba Cloud Server R&D and Ant Data Infrastructure teams.
The paper addresses the challenges faced by distributed databases under cloud‑native workloads, namely performance, cost, and stability pressures, and proposes a novel cloud‑native local storage pooling architecture that matches the performance of local storage while providing elasticity and reducing storage costs.
LightPool’s architecture consists of control nodes and worker nodes. Control nodes manage SSD pools, handle allocation and reclamation, and integrate with Kubernetes (k8s) via a CSI plugin, while worker nodes run containers and the LightPool storage engine, enabling elastic attachment of storage resources through NVMe‑over‑Fabric.
The scheduling framework mirrors k8s pod scheduling: it pre‑loads cluster resources, applies filter plugins (basic and affinity filters), and uses priority scoring to select optimal nodes, supporting custom filters and two strategies for local disk scheduling—resource‑extension updates and Scheduler Framework integration.
The storage engine is user‑space, lightweight, and supports zero‑copy local storage protocols, multiple storage media (including QLC SSDs and ZNS), and features such as snapshots and RAID, achieving high throughput and low CPU overhead.
High‑availability mechanisms include hot‑upgrade (sub‑second upgrade time) and hot‑migration, allowing seamless storage pool rebalancing and node failure recovery without service interruption.
LightPool is released as the open‑source project LiteIO (https://github.com/eosphoros-ai/liteio), and the paper’s full text is available through the HPCA proceedings.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.