Cloud Native 11 min read

How PoleFS Achieves Microsecond I/O with Multi‑Layer Caching and CTO Consistency

PoleFS is a high‑performance, cloud‑native distributed file system that combines NVMe‑accelerated hot storage with S3‑based cold storage, offering multiple client access methods, multi‑level metadata and data caches, prefetch/warm‑up strategies, and a Close‑to‑Open consistency model to balance performance and data correctness.

360 Smart Cloud
360 Smart Cloud
360 Smart Cloud
How PoleFS Achieves Microsecond I/O with Multi‑Layer Caching and CTO Consistency

Before discussing client‑side caching, PoleFS—a high‑performance, cloud‑native distributed file system developed by 360 Zhihui Cloud—uses a layered storage architecture that couples microsecond‑level I/O latency on NVMe with massive capacity on S3 object storage, providing both performance and elastic capacity.

Client Access Methods

PoleFS acts as a user‑space file system and supports several entry points to suit different workloads:

FUSE : Implements full POSIX semantics via the user‑space FUSE interface, allowing applications to interact with PoleFS just like a local file system.

Kubernetes CSI Driver : Enables containerized environments to mount storage through native PVCs, supporting pod‑to‑pod data sharing.

Hadoop Java SDK : Offers an HDFS‑compatible API for big‑data ecosystems, letting all Hadoop components store data in the underlying object store while benefiting from elastic scaling and low cost.

S3 Gateway : Exposes a standard S3 API so existing S3‑aware applications can connect without code changes.

Caching Architecture

PoleFS introduces multi‑level caches to accelerate both metadata and data paths.

Metadata Cache

Two layers are provided:

Kernel Metadata Cache : Stores inode attributes and directory entries in kernel space, reducing user‑kernel transitions. Its validity is controlled by FUSE timeout and flag parameters, and it is populated via lookup, getattr, and readdir calls.

Client‑side Memory Metadata Cache : Caches file open state, including generic attributes and PoleFS‑specific mapping information that links a file to its underlying data objects. When enabled, subsequent attribute or mapping queries hit this cache, avoiding remote metadata service calls.

Data Cache

Three layers are available: client memory cache, optional client disk cache, and a distributed cache cluster.

Write operations are first written to the client memory cache and returned immediately; the data is then asynchronously flushed to either the local disk cache (if enabled) or directly to the distributed cache, after which the backend metadata is updated. Disk‑cached data, when present, is also flushed to the distributed cache and finally to object storage. The system defaults to disabling the local disk cache to avoid coupling reliability to local disks and to keep cross‑node visibility consistent; users may enable write‑back caching if high‑performance disks are available.

Read requests first probe the client memory cache, then the optional disk cache, and finally the distributed cache; a cache miss at all levels triggers a download from object storage, after which the data is cached locally for future accesses. Because reads must fall back to object storage when all caches miss, read latency can increase sharply, making cache‑hit rate optimization critical.

Prefetch improves read cache hit rates by detecting sequential I/O patterns and asynchronously loading subsequent data objects into the cache. In random‑read scenarios, prefetch can still download whole objects, but for large sparse reads the benefit diminishes and cache pressure grows.

Warm‑up (pre‑loading) allows users to proactively load frequently accessed files into the distributed or disk cache before a workload starts—for example, pre‑warming training datasets in AI workloads to boost GPU utilization.

PoleFS caching layers
PoleFS caching layers

Cache Consistency Model

PoleFS adopts a Close‑to‑Open (CTO) consistency model, a weak consistency approach common in distributed file systems. After a client modifies a file, the changes become visible to other clients only after the modifying client calls close(), which forces a flush of data and metadata to the backend store. Subsequent open() calls by other clients trigger a reload of the latest metadata, ensuring that the typical open‑read/write‑close sequence sees a consistent view.

When client‑side memory caches are enabled, open operations may bypass the backend during the cache’s validity window, reducing latency but potentially violating CTO guarantees for a short period. Users must therefore balance cache usage against consistency requirements.

Guidelines for configuring caches:

For workloads with frequent updates and multi‑node sharing, disable client memory metadata cache or shorten its TTL to preserve cross‑node consistency.

For read‑only or non‑shared scenarios, enabling both data and metadata caches can dramatically lower latency and improve performance.

By combining CTO consistency with flexible cache controls, PoleFS delivers high throughput for concurrent accesses while allowing applications to tune the trade‑off between performance and data freshness.

cloud-nativeCachingStorageConsistencydistributed file system
360 Smart Cloud
Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.