Cloud Native 9 min read

How to Seamlessly Move AI Data Between OSS and CPFS with Kubernetes VolumePopulator

This article explains how Kubernetes VolumePopulator can automatically transfer AI training data from low‑cost OSS storage to high‑performance CPFS volumes, enabling on‑demand model loading, cost‑effective hot‑cold data management, and fully automated lifecycle handling in cloud‑native AI workloads.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
How to Seamlessly Move AI Data Between OSS and CPFS with Kubernetes VolumePopulator

Background

In cloud‑native AI training and inference, developers need both high‑throughput, low‑latency storage (e.g., Alibaba Cloud CPFS or SSDs) for active workloads and inexpensive, massive‑capacity object storage (OSS) for long‑term data retention.

Challenge

Traditionally, moving data required custom scripts or sidecar containers, creating operational overhead and latency.

VolumePopulator Mechanism

Kubernetes v1.33 introduced the standard VolumePopulator data‑fill pattern. Alibaba Cloud Container Service for Kubernetes (ACK) provides an implementation called OSSVolumePopulator , which automatically copies data from OSS to a high‑performance volume (such as CPFS) before the pod mounts the volume.

Practical Scenario: On‑Demand Model Loading

For model training or distributed inference, the full model repository is stored in OSS. Before a job starts, the required model (e.g., Qwen3‑32B) is copied into a CPFS FileSet . The training or inference pod then mounts the PVC and reads the model at native CPFS speed. When the job finishes, deleting the PVC releases the hot data and stops storage charges.

Solution Advantages

Extreme I/O acceleration : CPFS parallel file system eliminates OSS latency and bandwidth limits, maximizing GPU utilization.

Cost‑effective on‑demand usage : Data occupies CPFS only during task execution; Delete reclaim policy automatically removes hot data after the PVC is deleted.

Fully managed closed‑loop : The data‑fill process runs in the control plane, requiring no extra sidecar pods, thus freeing compute and network resources.

Data decoupling and security : Original data stays in OSS; hot volumes are independent, allowing concurrent fills for different tasks or model versions.

Step‑by‑Step Implementation

1. Define the data source (OSSVolumePopulator)

apiVersion: storage.alibabacloud.com/v1beta1
kind: OSSVolumePopulator
metadata:
  name: qwen3-32b
  namespace: bmcpfs-dataflow-demo  # must match PVC namespace
spec:
  bucket: models               # OSS bucket name
  region: cn-wulanchabu        # OSS bucket region
  endpoint: oss-cn-wulanchabu-internal.aliyuncs.com
  path: /Qwen3-32B/            # model path in bucket
  mode: bmcpfs-dataflow

2. Define the high‑performance storage target (CPFS)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: alicloud-bmcpfs-test
provisioner: bmcpfsplugin.csi.alibabacloud.com
parameters:
  bmcpfsId: bmcpfs-29000z8xz3lf5nj*****
allowVolumeExpansion: true
reclaimPolicy: Delete
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: qwen3-32b
  namespace: bmcpfs-dataflow-demo
spec:
  accessModes:
    - ReadOnlyMany
  dataSourceRef:
    apiGroup: storage.alibabacloud.com
    kind: OSSVolumePopulator
    name: qwen3-32b
  resources:
    requests:
      storage: 80Gi   # at least the size of the source data
  storageClassName: alicloud-bmcpfs-test
  volumeMode: Filesystem

3. Automatic data fill

When the PVC is created with an Immediate binding StorageClass, ACK triggers a CPFS data‑flow task that copies the model from OSS to the newly created CPFS FileSet. Progress can be inspected with kubectl describe ossvolumepopulator <name> or via the NAS console.

Bmcpfs Dataflow:
  62a4e7ec‑fae1‑4f11‑848f‑b57cxxxxxxxx:
    Data Flow Id:       df‑29d3ad9e9xxxxxxx
    Data Flow Task Id:  task‑2993179xxxxxxxxx
    File Set Id:        fset‑2997498xxxxxxxxx
    File System Id:    bmcpfs‑29000z8xz3lf5xxxxxxxx
    Progress:          59%

4. Mounting and cleanup

Mount the PVC to the workload (e.g., /models/Qwen3-32B) and start the inference engine:

python3 -m sglang.launch_server --model-path /models/Qwen3-32B --tp 2

When no further tasks need the model, delete the PVC; the hot data is removed and storage charges stop.

Conclusion

ACK’s VolumePopulator‑based storage fill solution combines the Kubernetes standard mechanism with Alibaba Cloud CPFS’s data‑flow capability, delivering peak performance for AI compute while dramatically reducing storage costs through automated hot‑cold data management.

KubernetesOSSCloud Native StorageAI trainingVolumePopulatorCPFS
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.