Cloud Native 15 min read

Why OSSFS 2.0 Outperforms 1.0 for AI Workloads on ACK Clusters

The article explains how Alibaba Cloud's OSSFS 2.0 redesign—dropping full POSIX semantics, leveraging FUSE 3 low‑level APIs, a lightweight metadata cache and internal coroutine technology—delivers up to 18‑fold faster sequential writes, 8.5‑fold faster reads and 280‑fold higher small‑file concurrency compared with OSSFS 1.0, making it ideal for AI training and inference in Kubernetes (ACK) environments.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Why OSSFS 2.0 Outperforms 1.0 for AI Workloads on ACK Clusters
OSSFS Overview
OSSFS Overview

Background

Alibaba Cloud Object Storage Service (OSS) provides massive, secure, low‑cost storage with up to 100 Gbps download bandwidth in several regions. To let containers in a Kubernetes (ACK) cluster read and write OSS data like a local file system, a FUSE‑based client is required to translate POSIX operations into RESTful OSS requests.

Why OSSFS 1.0 Struggles with AI Scenarios

OSSFS 1.0, derived from the open‑source S3FS‑FUSE project, implements a full POSIX layer. This design forces frequent HeadObj metadata calls and writes that must be persisted to local disks, creating CPU, disk‑I/O and network bottlenecks that are unacceptable for AI training (large datasets, many small files) and inference (large model files).

POSIX metadata (UID, permissions, symlinks) requires many metadata requests.

Random‑write support forces data to be cached on the node’s ESSD disk, limiting throughput.

Optimisations such as readdir_optimize and direct_read help only in specific cases.

OSSFS 2.0 Design Goals

OSSFS 2.0 abandons full POSIX compatibility and focuses on exploiting OSS’s native high‑bandwidth capabilities. Key changes include:

Maintain only essential file attributes (mtime, size) to reduce metadata requests.

Re‑implement the client with the FUSE 3 low‑level API, cutting thread switches and data copies.

Introduce a more flexible metadata cache with faster lookup and eviction.

Use Alibaba Cloud’s internal coroutine framework to improve concurrency and lower CPU usage.

Performance Evaluation

Using the fio tool, both clients were benchmarked on single‑thread sequential write/read of a 100 GB file, multi‑thread (4‑thread) reads, and 128‑thread small‑file (128 KB) reads. Results (bandwidth, CPU core utilisation, peak memory) show:

Sequential write 100 GB:
  OSSFS 2.0  – 2.2 GB/s, 207 % CPU, 2167 MB RAM
  OSSFS 1.0  – 118 MB/s,   5 % CPU,   15 MB RAM
  → 18× faster

Sequential read 100 GB (single thread):
  OSSFS 2.0  – 3.0 GB/s, 378 % CPU, 1617 MB RAM
  OSSFS 1.0  – 355 MB/s, 50 % CPU,   400 MB RAM
  → 8.5× faster

Sequential read 100 GB (4 threads):
  OSSFS 2.0  – 7.1 GB/s, 1187 % CPU, 6.2 GB RAM
  OSSFS 1.0  – 1.4 GB/s, 210 % CPU, 1.6 GB RAM
  → 5× faster

128‑thread small‑file (128 KB) read:
  OSSFS 2.0  – 1 GB/s, 247 % CPU, 212 MB RAM
  OSSFS 1.0  – 3.5 MB/s, 3 % CPU, 200 MB RAM
  → 280× faster

In an AI inference simulation loading a 134.5 GB safetensors model (Qwen‑2.5‑72B‑Instruct) with the Hugging Face vllm library on a 128‑vCPU node, OSSFS 2.0 completed the load in 130 seconds versus 1135 seconds for OSSFS 1.0, confirming the large‑model optimisation.

Using OSSFS 2.0 in ACK

The ACK CSI driver now supports a fuseType: ossfs2 attribute. Deploying the following YAML creates a PersistentVolume (PV) and PersistentVolumeClaim (PVC) that mount an OSS bucket via OSSFS 2.0:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-ossfs2
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: pv-ossfs2
    nodePublishSecretRef:
      name: oss-secret
      namespace: default
    volumeAttributes:
      fuseType: ossfs2
      bucket: cnfs-oss-test
      path: /subpath
      url: oss-cn-hangzhou-internal.aliyuncs.com
      otherOpts: "-o close_to_open=false"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-ossfs2
  namespace: default
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 20Gi
  volumeName: pv-ossfs2

The volume can be mounted statically or dynamically; detailed usage and mount‑option documentation are linked in the original article.

Recommendation Matrix

For high‑throughput AI training, inference, or any workload that reads large files sequentially or needs massive small‑file concurrency, OSSFS 2.0 is the preferred storage volume. For workloads that require full POSIX semantics, random writes, or fine‑grained permission control, OSSFS 1.0 remains appropriate.

Conclusion

OSSFS 2.0 delivers industry‑leading read/write performance for AI and other big‑data workloads on ACK clusters, achieving up to dozens of times higher throughput while using fewer CPU resources, and is fully supported by the Alibaba Cloud CSI driver.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceAIAlibaba CloudOSSFS
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.