Cloud Native 9 min read

Unlock Microsecond I/O: Inside PoleFS’s NVMe‑Accelerated, S3‑Backed Cloud‑Native File System

PoleFS is a self‑developed, cloud‑native distributed file system that combines POSIX‑compatible interfaces, a high‑performance NVMe cache layer, and low‑cost S3 object storage to achieve microsecond latency, millions of IOPS, massive directory scalability, multi‑protocol access, and flexible client‑side caching for AI, big data, and container workloads.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Unlock Microsecond I/O: Inside PoleFS’s NVMe‑Accelerated, S3‑Backed Cloud‑Native File System

PoleFS is a self‑developed, cloud‑native high‑performance distributed file system that fully complies with POSIX standards. Using a proprietary distributed cache architecture tightly integrated with NVMe storage, it delivers microsecond‑level I/O latency and millions of IOPS concurrent processing, while leveraging low‑cost S3 object storage for global data persistence, forming a tiered “hot NVMe + cold S3” storage model for elastic performance and capacity scaling. Primary use cases include AI training, large models, and container platforms.

The system comprises metadata service, cache data service, data storage service, FUSE client, Java SDK, and CSI driver. The metadata service consists of Master and MetaNode nodes; Master manages volumes and cache information, while MetaNode stores file metadata. Metadata partitions store inode and dentry information across a distributed cluster, enabling easy scaling and support for extremely large directories.

Data service combines a distributed cache with an S3‑compatible object store. The object store provides low‑cost, high‑capacity persistence, while the cache layer uses consistent hashing to select high‑performance nodes, offering triple‑replica write redundancy and single‑replica read efficiency.

Key capabilities include:

Unified storage foundation: object storage via S3 protocol with distributed cache for cost‑effective, elastic performance.

High‑performance caching: consistent‑hash based elastic distributed cache delivering ultra‑low latency and high throughput.

Multiple access interfaces: POSIX, SMB/CIFS, NFS, Java API, HTTP, etc., supporting AI training, big data, logging, backup.

Support for directories containing over a billion files and total file counts in the hundreds of billions.

Response times under 20 ms.

Recycle bin for safe data recovery.

Metadata

PoleFS provides a highly scalable metadata service. Unlike traditional file systems that suffer from static or dynamic subtree limitations, PoleFS shards metadata and manages it hierarchically in a distributed, auto‑balancing manner, avoiding single‑node capacity bottlenecks and supporting massive file volumes. It uses an in‑memory, highly available metadata cluster, storing inode and dentry information per volume partition, and optimizes POSIX operations for superior performance and manageability.

File distribution follows the JuiceFS model, using fixed‑size 64 MiB chunks, slices for write operations, and 4 MiB blocks as the physical storage unit in object storage and cache.

Distributed Cache

The cache layer is built on high‑performance hardware with low‑cost object storage as the base, enabling high performance at reduced cost. It allows flexible data placement across public and private clouds, and integrates with big‑data and high‑performance computing frameworks for analytics and computation.

It employs a “write‑three‑read‑one” design: triple‑replica writes ensure reliability, while a single‑replica read path provides efficiency. Cache space can be dynamically adjusted, supports rate limiting per volume, and asynchronously flushes writes to the underlying storage. Reads use an LRU policy considering capacity, file count, and expiration to avoid single‑dimension limitations. This design improves cache hit rates and enables features such as data pre‑loading.

Client Multi‑Level Cache

Clients can configure local cache modes, leveraging OS cache, process memory, and local disk to form a three‑level cache for data and metadata. The client can also dynamically adjust distributed cache size, delivering extreme performance for workloads.

By automatically adjusting metadata caching on the client side based on access patterns, PoleFS increases cache hit rates, shortens metadata access paths, and uses a multi‑client notification mechanism to make data updates instantly visible.

图片
图片
Cloud Native Storagedistributed file systemmetadata scalinghigh performance I/ONVMe cacheS3 object storage
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.