Why OSSFS 2.0 Outperforms 1.0 for AI Workloads on ACK Clusters
The article explains how Alibaba Cloud's OSSFS 2.0 redesign—dropping full POSIX semantics, leveraging FUSE 3 low‑level APIs, a lightweight metadata cache and internal coroutine technology—delivers up to 18‑fold faster sequential writes, 8.5‑fold faster reads and 280‑fold higher small‑file concurrency compared with OSSFS 1.0, making it ideal for AI training and inference in Kubernetes (ACK) environments.
Background
Alibaba Cloud Object Storage Service (OSS) provides massive, secure, low‑cost storage with up to 100 Gbps download bandwidth in several regions. To let containers in a Kubernetes (ACK) cluster read and write OSS data like a local file system, a FUSE‑based client is required to translate POSIX operations into RESTful OSS requests.
Why OSSFS 1.0 Struggles with AI Scenarios
OSSFS 1.0, derived from the open‑source S3FS‑FUSE project, implements a full POSIX layer. This design forces frequent HeadObj metadata calls and writes that must be persisted to local disks, creating CPU, disk‑I/O and network bottlenecks that are unacceptable for AI training (large datasets, many small files) and inference (large model files).
POSIX metadata (UID, permissions, symlinks) requires many metadata requests.
Random‑write support forces data to be cached on the node’s ESSD disk, limiting throughput.
Optimisations such as readdir_optimize and direct_read help only in specific cases.
OSSFS 2.0 Design Goals
OSSFS 2.0 abandons full POSIX compatibility and focuses on exploiting OSS’s native high‑bandwidth capabilities. Key changes include:
Maintain only essential file attributes (mtime, size) to reduce metadata requests.
Re‑implement the client with the FUSE 3 low‑level API, cutting thread switches and data copies.
Introduce a more flexible metadata cache with faster lookup and eviction.
Use Alibaba Cloud’s internal coroutine framework to improve concurrency and lower CPU usage.
Performance Evaluation
Using the fio tool, both clients were benchmarked on single‑thread sequential write/read of a 100 GB file, multi‑thread (4‑thread) reads, and 128‑thread small‑file (128 KB) reads. Results (bandwidth, CPU core utilisation, peak memory) show:
Sequential write 100 GB:
OSSFS 2.0 – 2.2 GB/s, 207 % CPU, 2167 MB RAM
OSSFS 1.0 – 118 MB/s, 5 % CPU, 15 MB RAM
→ 18× faster
Sequential read 100 GB (single thread):
OSSFS 2.0 – 3.0 GB/s, 378 % CPU, 1617 MB RAM
OSSFS 1.0 – 355 MB/s, 50 % CPU, 400 MB RAM
→ 8.5× faster
Sequential read 100 GB (4 threads):
OSSFS 2.0 – 7.1 GB/s, 1187 % CPU, 6.2 GB RAM
OSSFS 1.0 – 1.4 GB/s, 210 % CPU, 1.6 GB RAM
→ 5× faster
128‑thread small‑file (128 KB) read:
OSSFS 2.0 – 1 GB/s, 247 % CPU, 212 MB RAM
OSSFS 1.0 – 3.5 MB/s, 3 % CPU, 200 MB RAM
→ 280× fasterIn an AI inference simulation loading a 134.5 GB safetensors model (Qwen‑2.5‑72B‑Instruct) with the Hugging Face vllm library on a 128‑vCPU node, OSSFS 2.0 completed the load in 130 seconds versus 1135 seconds for OSSFS 1.0, confirming the large‑model optimisation.
Using OSSFS 2.0 in ACK
The ACK CSI driver now supports a fuseType: ossfs2 attribute. Deploying the following YAML creates a PersistentVolume (PV) and PersistentVolumeClaim (PVC) that mount an OSS bucket via OSSFS 2.0:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-ossfs2
spec:
capacity:
storage: 20Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: ossplugin.csi.alibabacloud.com
volumeHandle: pv-ossfs2
nodePublishSecretRef:
name: oss-secret
namespace: default
volumeAttributes:
fuseType: ossfs2
bucket: cnfs-oss-test
path: /subpath
url: oss-cn-hangzhou-internal.aliyuncs.com
otherOpts: "-o close_to_open=false"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-ossfs2
namespace: default
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 20Gi
volumeName: pv-ossfs2The volume can be mounted statically or dynamically; detailed usage and mount‑option documentation are linked in the original article.
Recommendation Matrix
For high‑throughput AI training, inference, or any workload that reads large files sequentially or needs massive small‑file concurrency, OSSFS 2.0 is the preferred storage volume. For workloads that require full POSIX semantics, random writes, or fine‑grained permission control, OSSFS 1.0 remains appropriate.
Conclusion
OSSFS 2.0 delivers industry‑leading read/write performance for AI and other big‑data workloads on ACK clusters, achieving up to dozens of times higher throughput while using fewer CPU resources, and is fully supported by the Alibaba Cloud CSI driver.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
