Cloud Native 21 min read

How PolarFS Achieves Ultra‑Low Latency and High Reliability for Cloud‑Native Databases

PolarFS is a user‑space, ultra‑low‑latency distributed file system designed for POLARDB that leverages RDMA, NVMe SSDs, and a novel ParallelRaft protocol to deliver near‑local‑SSD performance, strong consistency, and seamless failover in a cloud‑native environment.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How PolarFS Achieves Ultra‑Low Latency and High Reliability for Cloud‑Native Databases

Background

PolarFS is a distributed file system built to support POLARDB, a cloud‑native database that separates compute and storage. By moving the I/O stack to user space and exploiting RDMA and NVMe SSDs, PolarFS reduces end‑to‑end latency to levels comparable with a local PCIe SSD.

Design Goals

Separate hardware for compute and storage nodes, allowing independent customization.

Aggregate storage across nodes into a single pool, reducing fragmentation and enabling horizontal scaling.

Provide high availability and reliability for database instances, simplifying migration and failover.

Enable cloud‑database services to benefit from virtualized compute environments and enhanced features such as multi‑read replicas and snapshots.

System Architecture

PolarFS consists of two management layers: virtualized storage resource management (providing logical volumes for each database instance) and metadata management (handling file operations and concurrency).

Key Components

libpfs : a lightweight user‑space library that replaces the standard file‑system interface, keeping the entire I/O path in user space.

PolarSwitch : a daemon on compute nodes that forwards I/O requests to the appropriate ChunkServer.

ChunkServer : runs on storage nodes, manages I/O for each Chunk, uses a hybrid 3DXPoint + NVMe SSD WAL buffer, and replicates writes via a custom ParallelRaft protocol.

PolarCtrl : the control‑plane master that monitors ChunkServers, manages volume creation, chunk layout, metadata, and performs periodic CRC checks.

System diagram
System diagram

Storage Organization

PolarFS organizes storage into three layers:

Volume : logical storage space per database instance (10 GB–100 TB) containing filesystem metadata, journal, and Paxos files.

Chunk : the smallest data distribution unit, each stored on a single NVMe SSD (typical size 10 GB), reducing metadata overhead and enabling efficient load balancing.

Block : 64 KB units within a Chunk, dynamically mapped and cached in memory for fast I/O.

I/O Flow

A write request from POLARDB travels through libpfs to PolarSwitch, which maps it to the target Chunk and forwards it to the primary ChunkServer. The request is placed in a pre‑allocated buffer, written to the WAL via SPDK, replicated to follower ChunkServers using RDMA, and finally applied to the data block after majority acknowledgment.

I/O flow diagram
I/O flow diagram

ParallelRaft Protocol

To overcome Raft’s serialization bottleneck under high concurrency, PolarFS introduces ParallelRaft, which relaxes strict ordering while preserving safety properties. Log entries that do not overlap in storage range can be committed and applied out of order; conflicting entries are serialized. A look‑behind buffer records recent LBA modifications to detect conflicts, enabling safe out‑of‑order application.

Centralized Control with Local Autonomy

PolarCtrl acts as a centralized master for metadata and resource management, while ChunkServers operate autonomously, handling replication and leader election locally via ParallelRaft. This hybrid design avoids a single point of failure and minimizes metadata I/O.

Performance Evaluation

Benchmarks using Sysbench show that POLARDB on PolarFS achieves write latency close to a single‑node SSD and significantly higher TPS compared to traditional RDS offerings, while maintaining strong data reliability.

Performance comparison
Performance comparison

Snapshots and Failover

PolarFS provides instant filesystem snapshots built from per‑ChunkServer local snapshots, enabling rapid logical backups of massive databases. The shared‑access design allows read‑only instances to serve queries without lock contention, and failed write instances can be promoted to writable nodes without data inconsistency.

Conclusion

PolarFS demonstrates that a purpose‑built, user‑space, cloud‑native distributed file system can deliver ultra‑low latency, high availability, and seamless integration with cloud databases, paving the way for future optimizations with emerging hardware such as NVM and FPGA.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Low latencyCloud Native StorageDistributed File SystemPolardbParallelRaftPolarFS
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.