Eliminate Data Bottlenecks in Large‑Scale Argo Workflows with VolumePopulator
By integrating Alibaba Cloud ACK’s Kubernetes VolumePopulator with Argo Workflows, this guide shows how to pre‑populate independent high‑performance volumes for each parallel task, eliminating I/O contention, ensuring data isolation, and enabling scalable, serverless‑accelerated pipelines for large‑scale data processing.
Problem Statement
When executing large‑scale Argo Workflows with hundreds of parallel pods, simultaneous reads and writes to a single OSS path cause bandwidth throttling, connection limits, and data contamination across tasks.
Kubernetes VolumePopulator (v1.33 GA)
VolumePopulator introduces a standard data‑fill pattern using dataSourceRef in a PersistentVolumeClaim (PVC). The populator copies data from a source (e.g., OSS) into a block storage volume before the pod mounts it, providing immediate high‑performance local I/O.
ACK Implementation – OSSVolumePopulator
Alibaba Cloud Container Service for Kubernetes (ACK) supplies the CRD OSSVolumePopulator (apiVersion storage.alibabacloud.com/v1beta1) that implements the generic populator for Alibaba Cloud OSS.
OSSVolumePopulator Spec Example
apiVersion: storage.alibabacloud.com/v1beta1
kind: OSSVolumePopulator
metadata:
name: generic-demo
namespace: argo
spec:
bucket: my-test-bucket
region: cn-hangzhou
endpoint: oss-cn-hangzhou-internal.aliyuncs.com
path: /many-files/
mode: generic
generic:
labels:
alibabacloud.com/acs: "true"
alibabacloud.com/compute-class: "general-purpose"
rrsaConfigs:
roleArn: "acs:ram::123456789012:role/oss-populator"
oidcProviderArn: "acs:ram::123456789012:oidc-provider/my-oidc-provider"
# throughput: 1000 # optional MBps limitArgo Workflow Integration
Define an ephemeral.volumeClaimTemplate that references the OSSVolumePopulator. Each parallel replica receives a distinct PVC that is pre‑filled with the OSS data.
Workflow Template Example
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: parallel-data-process-with-ossvp-
namespace: argo
spec:
arguments:
parameters:
- name: number
value: "2"
entrypoint: main
volumes:
- name: scratch-volume
ephemeral:
volumeClaimTemplate:
metadata:
labels:
diskType: scratch-volume
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "alicloud-disk-essd"
resources:
requests:
storage: 20Gi
dataSourceRef:
apiGroup: storage.alibabacloud.com
kind: OSSVolumePopulator
name: generic-demo
templates:
- name: main
dag:
tasks:
- name: echo-task
template: echo-template
arguments:
parameters:
- name: index
value: "{{item}}"
withSequence:
count: "{{workflow.parameters.number}}"
- name: echo-template
container:
image: mirrors-ssl.aliyuncs.com/busybox:latest
command: ["sh","-c"]
args:
- |
echo "🚀 task {{inputs.parameters.index}} start"
touch /scratch-volume/{{inputs.parameters.index}}-logs
ls /scratch-volume
echo "✅ task {{inputs.parameters.index}} done"
volumeMounts:
- name: scratch-volume
mountPath: /scratch-volume
resources:
limits:
cpu: "4"
memory: 16Gi
requests:
cpu: "4"
memory: 16Gi
inputs:
parameters:
- name: indexResult Verification & Cleanup
Pod logs show the pre‑filled OSS files and a per‑task log file, confirming data isolation. After the workflow finishes, Argo automatically deletes the temporary PVCs; the associated cloud disks are reclaimed and billing stops.
Operational Recommendations
Compute placement : Run pre‑fill pods on Serverless pools or Spot instances to reduce cost.
Throughput throttling : Use the optional throughput field or a side‑car to cap MBps per task.
Reclaim policy : Set the StorageClass reclaimPolicy: Delete so disks are removed immediately after PVC deletion.
Deployment Checklist for ACK
Install components : Deploy Argo Workflows and the ACK storage‑operator in the cluster.
Enable features : In the storage‑operator configuration, set enableVolumePopulator: true and enableVolumePopulatorPodHandler: true.
RRSA permission : Grant the cluster‑level RRSA role that allows the populator controller to read from the specified OSS bucket.
References
Argo Workflows documentation: https://argoproj.github.io/workflows/
Kubernetes Volume Populator GA (v1.33): https://kubernetes.io/blog/2025/05/08/kubernetes-v1-33-volume-populators-ga/
Argo Volumes guide: https://argo-workflows.readthedocs.io/en/latest/walk-through/volumes/
Alibaba Cloud OSSVolumePopulator guide: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/prefetch-oss-data-into-high-performance-volumes-on-demand
Argo Workflows component on ACK: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/batch-task-orchestration/
ACK storage‑operator component: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/use-storage-operator-to-deploy-and-upgrade-storage-components
RRSA authentication guide: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/use-rrsa-to-authorize-pods-to-access-different-cloud-services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
