Cloud Native 12 min read

Introducing LiteIO: Open‑Source High‑Performance Cloud‑Native Block Device Service

LiteIO is an open‑source, cloud‑native block device service that leverages NVMe‑oF and SPDK to provide high‑performance, scalable storage for Kubernetes‑based workloads, improving storage utilization and enabling FinOps‑driven cost efficiency across large‑scale production environments.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Introducing LiteIO: Open‑Source High‑Performance Cloud‑Native Block Device Service

In the era of traditional distributed storage, LiteIO represents a peer‑to‑peer block device service that has delivered significant business and technical benefits within Ant Group and has been recognized in the HPCA'24 conference and CCF‑A class papers.

LiteIO is a high‑performance, easily extensible cloud‑native block device service designed for hyper‑converged Kubernetes architectures, offering stable, efficient, and scalable disk services for Ant Group’s data‑intensive products.

It pools local disks or logical volumes and shares them over the network, using a peer‑to‑peer design to limit hardware‑failure impact and eliminate storage redundancy, thereby increasing usable space.

Design Background : In a FinOps‑focused environment, even a 1% improvement in storage utilization yields substantial cost savings; traditional distributed storage poses challenges such as uneven utilization, poor scalability, increased replica counts, and large failure domains.

Design Idea : LiteIO adopts a decentralized architecture based on the SPDK engine and NVMe‑over‑Fabric (NVMe‑oF) protocol, connecting compute nodes directly to remote storage nodes to achieve near‑local disk performance while maintaining Kubernetes‑driven scheduling and fault isolation.

FinOps : By dynamically allocating under‑utilized storage to remote compute nodes and pooling resources globally, LiteIO boosts overall storage utilization.

General Storage‑Compute Decoupling : LiteIO presents storage as regular block devices, compatible with any application or database (e.g., OceanBase, MySQL, PostgreSQL) without requiring modifications.

Serverless Capability : The service enables seamless scaling by attaching storage to larger compute containers or adding disks for capacity expansion without downtime.

Technical Features include high‑performance NVMe‑oF protocol, simplified I/O path with single‑hop network access, zero‑copy data transfer using shared memory and DMA remapping, hot upgrade with sub‑100 ms I/O jitter, hot migration with multi‑round incremental copying, snapshot support via CSI for LVM and SPDK engines, online volume expansion, multi‑disk aggregation, and thin provisioning for over‑commit storage.

Practice and Impact : Deployed on tens of thousands of production servers at Ant Group, LiteIO has increased storage utilization by 25% and added only ~2.1 µs latency compared to local storage, while providing flexible features such as snapshots, multi‑disk aggregation, and seamless integration with the Kubernetes ecosystem.

The project is now open‑source on GitHub (https://github.com/eosphoros-ai/liteio), inviting community contributions and further development.

KubernetesFinOpscloud-native storageBlock DeviceLiteIONVMe-oF
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.