How to Seamlessly Move AI Data Between OSS and CPFS with Kubernetes VolumePopulator
This article explains how Kubernetes VolumePopulator can automatically transfer AI training data from low‑cost OSS storage to high‑performance CPFS volumes, enabling on‑demand model loading, cost‑effective hot‑cold data management, and fully automated lifecycle handling in cloud‑native AI workloads.
Background
In cloud‑native AI training and inference, developers need both high‑throughput, low‑latency storage (e.g., Alibaba Cloud CPFS or SSDs) for active workloads and inexpensive, massive‑capacity object storage (OSS) for long‑term data retention.
Challenge
Traditionally, moving data required custom scripts or sidecar containers, creating operational overhead and latency.
VolumePopulator Mechanism
Kubernetes v1.33 introduced the standard VolumePopulator data‑fill pattern. Alibaba Cloud Container Service for Kubernetes (ACK) provides an implementation called OSSVolumePopulator , which automatically copies data from OSS to a high‑performance volume (such as CPFS) before the pod mounts the volume.
Practical Scenario: On‑Demand Model Loading
For model training or distributed inference, the full model repository is stored in OSS. Before a job starts, the required model (e.g., Qwen3‑32B) is copied into a CPFS FileSet . The training or inference pod then mounts the PVC and reads the model at native CPFS speed. When the job finishes, deleting the PVC releases the hot data and stops storage charges.
Solution Advantages
Extreme I/O acceleration : CPFS parallel file system eliminates OSS latency and bandwidth limits, maximizing GPU utilization.
Cost‑effective on‑demand usage : Data occupies CPFS only during task execution; Delete reclaim policy automatically removes hot data after the PVC is deleted.
Fully managed closed‑loop : The data‑fill process runs in the control plane, requiring no extra sidecar pods, thus freeing compute and network resources.
Data decoupling and security : Original data stays in OSS; hot volumes are independent, allowing concurrent fills for different tasks or model versions.
Step‑by‑Step Implementation
1. Define the data source (OSSVolumePopulator)
apiVersion: storage.alibabacloud.com/v1beta1
kind: OSSVolumePopulator
metadata:
name: qwen3-32b
namespace: bmcpfs-dataflow-demo # must match PVC namespace
spec:
bucket: models # OSS bucket name
region: cn-wulanchabu # OSS bucket region
endpoint: oss-cn-wulanchabu-internal.aliyuncs.com
path: /Qwen3-32B/ # model path in bucket
mode: bmcpfs-dataflow2. Define the high‑performance storage target (CPFS)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: alicloud-bmcpfs-test
provisioner: bmcpfsplugin.csi.alibabacloud.com
parameters:
bmcpfsId: bmcpfs-29000z8xz3lf5nj*****
allowVolumeExpansion: true
reclaimPolicy: Delete
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: qwen3-32b
namespace: bmcpfs-dataflow-demo
spec:
accessModes:
- ReadOnlyMany
dataSourceRef:
apiGroup: storage.alibabacloud.com
kind: OSSVolumePopulator
name: qwen3-32b
resources:
requests:
storage: 80Gi # at least the size of the source data
storageClassName: alicloud-bmcpfs-test
volumeMode: Filesystem3. Automatic data fill
When the PVC is created with an Immediate binding StorageClass, ACK triggers a CPFS data‑flow task that copies the model from OSS to the newly created CPFS FileSet. Progress can be inspected with kubectl describe ossvolumepopulator <name> or via the NAS console.
Bmcpfs Dataflow:
62a4e7ec‑fae1‑4f11‑848f‑b57cxxxxxxxx:
Data Flow Id: df‑29d3ad9e9xxxxxxx
Data Flow Task Id: task‑2993179xxxxxxxxx
File Set Id: fset‑2997498xxxxxxxxx
File System Id: bmcpfs‑29000z8xz3lf5xxxxxxxx
Progress: 59%4. Mounting and cleanup
Mount the PVC to the workload (e.g., /models/Qwen3-32B) and start the inference engine:
python3 -m sglang.launch_server --model-path /models/Qwen3-32B --tp 2When no further tasks need the model, delete the PVC; the hot data is removed and storage charges stop.
Conclusion
ACK’s VolumePopulator‑based storage fill solution combines the Kubernetes standard mechanism with Alibaba Cloud CPFS’s data‑flow capability, delivering peak performance for AI compute while dramatically reducing storage costs through automated hot‑cold data management.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
