Cloud Native 12 min read

How PoleFS’s BlobCache Powers High‑Performance Cloud‑Native Distributed Storage

This article details the architecture, key concepts, features, and design formulas of PoleFS’s BlobCache distributed caching subsystem, explaining how it achieves high performance, reliability, scalability, multi‑tenant isolation, and elastic capacity within a cloud‑native file system.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
How PoleFS’s BlobCache Powers High‑Performance Cloud‑Native Distributed Storage

1. Introduction

PoleFS is a self‑developed, cloud‑native, high‑performance distributed file system comprising a client, metadata service, distributed cache service, and data service. This article focuses on the distributed cache service, describing its system architecture, characteristics, and design.

Key terminology:

Volume : Logical concept consisting of metadata and data; from the client side it appears as an accessible file system instance, and from storage it maps to a cache collection and an OSS bucket.

Collection : Cache set created for a volume, containing multiple VIDs and corresponding one‑to‑one with the volume and OSS bucket.

VID : Basic data management unit.

Needle : Smallest storage unit, matching one OSS object.

2. Distributed Cache System Architecture

BlobCache is the distributed cache subsystem of PoleFS, positioned between the client and OSS (S3‑compatible object storage). It provides high performance, reliability, and availability. BlobCache consists of a master component managing cluster and cache metadata (using Raft for consistency) and volume server components handling data storage, flushing, and access.

PoleFS overall architecture diagram:

3. Distributed Cache Features

Read/write cache separation with configurable sizes.

Elastic cache that adjusts space based on actual usage.

High availability with self‑healing mechanisms.

High reliability using a write‑three‑read‑one design and CRC checks.

Scalable horizontally.

Multi‑tenant isolation.

4. Distributed Cache Design

4.1 Key Structures

4.1.1 Collection

A collection is a cache set for a volume, composed of several VIDs (three replicas: one leader, two followers). VIDs are evenly distributed across volume servers based on server weight, which correlates with remaining logical space.

Example: A collection with 10 VIDs is illustrated below.

Formula for calculating the number of VIDs in a collection: vidCount = ⌊cacheCap / cacheBase⌋ * vidBase + vidBase where vidCount is the number of VIDs, cacheCap is the user‑configured total cache size, cacheBase is the system’s base cache size, and vidBase is the base number of VIDs.

Example: For a 150 GB cache (write 50 GB, read 100 GB) with cacheBase=100 GB and vidBase=10, vidCount = ⌊150/100⌋*10+10 = 20. Each VID then gets 2.5 GB write cache and 5 GB read cache.

4.1.2 VID

A VID contains write cache, read cache (only on the leader), and a manifest file. Write cache uses multiple buffers (buf), each holding a pair of dat (data) and idx (index) files; the smallest unit inside a buf is a needle. Read cache consists of files that map one‑to‑one with OSS objects. The manifest records VID metadata.

Formula for the number of write buffers per VID: bufCount = writeCacheCap / datCapLimit where writeCacheCap is the write cache capacity of the VID and datCapLimit is the system‑configured limit for a dat file.

Example: With a write cache of 2.5 GB and datCapLimit=0.1 GB, bufCount = 2.5 / 0.1 = 25.

4.2 Data Read/Write

Read/write operations use consistent hashing to balance load. Each VID occupies a range on the hash ring; requests are routed to the appropriate VID leader based on the key’s hash.

Write: The leader writes data and replicates to followers; success is returned only after all replicas confirm. Written data is later flushed to read cache.

Read: The leader reads from write cache, then read cache, and finally OSS if needed; data fetched from OSS is asynchronously cached.

Example: Data is split into 1 MB blocks, each assigned a unique key. The key’s hash determines the target VID; the client obtains the leader from the master and performs read/write.

During writes, buffers are filled sequentially; when a buffer reaches its limit, it is flushed and a new buffer is created.

During reads, the system searches buffers in reverse order, then read cache, and finally OSS if necessary.

4.3 Cache Scaling

When a user changes the cache size, the system recalculates the number of VIDs for the collection. If the VID count remains unchanged, only cache capacities are updated; otherwise, a new set of VIDs is generated and the old set is asynchronously retired.

4.4 Cache Elasticity

If a cache remains idle for a configurable period (default 1 hour), the system reclaims unused logical space, keeping only the minimum required for write buffers and any existing read data.

When previously idle cache becomes active, the system expands the allocated space back to the configured size.

Example: A 20 GB cache (10 GB write, 10 GB read) initially occupies 40 GB logical space (three write replicas, one read replica). After prolonged inactivity, it can shrink to 6 GB, allowing the cluster to over‑provision cache up to 6.7× the original size.

System Architecturedistributed cacheCloud Native StoragePoleFSblobcache
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.