Cloud Native 11 min read

Eliminate Data Bottlenecks in Large‑Scale Argo Workflows with VolumePopulator

By integrating Alibaba Cloud ACK’s Kubernetes VolumePopulator with Argo Workflows, this guide shows how to pre‑populate independent high‑performance volumes for each parallel task, eliminating I/O contention, ensuring data isolation, and enabling scalable, serverless‑accelerated pipelines for large‑scale data processing.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Eliminate Data Bottlenecks in Large‑Scale Argo Workflows with VolumePopulator

Problem Statement

When executing large‑scale Argo Workflows with hundreds of parallel pods, simultaneous reads and writes to a single OSS path cause bandwidth throttling, connection limits, and data contamination across tasks.

Kubernetes VolumePopulator (v1.33 GA)

VolumePopulator introduces a standard data‑fill pattern using dataSourceRef in a PersistentVolumeClaim (PVC). The populator copies data from a source (e.g., OSS) into a block storage volume before the pod mounts it, providing immediate high‑performance local I/O.

ACK Implementation – OSSVolumePopulator

Alibaba Cloud Container Service for Kubernetes (ACK) supplies the CRD OSSVolumePopulator (apiVersion storage.alibabacloud.com/v1beta1) that implements the generic populator for Alibaba Cloud OSS.

OSSVolumePopulator Spec Example

apiVersion: storage.alibabacloud.com/v1beta1
kind: OSSVolumePopulator
metadata:
  name: generic-demo
  namespace: argo
spec:
  bucket: my-test-bucket
  region: cn-hangzhou
  endpoint: oss-cn-hangzhou-internal.aliyuncs.com
  path: /many-files/
  mode: generic
  generic:
    labels:
      alibabacloud.com/acs: "true"
      alibabacloud.com/compute-class: "general-purpose"
    rrsaConfigs:
      roleArn: "acs:ram::123456789012:role/oss-populator"
      oidcProviderArn: "acs:ram::123456789012:oidc-provider/my-oidc-provider"
  # throughput: 1000   # optional MBps limit

Argo Workflow Integration

Define an ephemeral.volumeClaimTemplate that references the OSSVolumePopulator. Each parallel replica receives a distinct PVC that is pre‑filled with the OSS data.

Workflow Template Example

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: parallel-data-process-with-ossvp-
  namespace: argo
spec:
  arguments:
    parameters:
    - name: number
      value: "2"
  entrypoint: main
  volumes:
  - name: scratch-volume
    ephemeral:
      volumeClaimTemplate:
        metadata:
          labels:
            diskType: scratch-volume
        spec:
          accessModes: ["ReadWriteOnce"]
          storageClassName: "alicloud-disk-essd"
          resources:
            requests:
              storage: 20Gi
          dataSourceRef:
            apiGroup: storage.alibabacloud.com
            kind: OSSVolumePopulator
            name: generic-demo
  templates:
  - name: main
    dag:
      tasks:
      - name: echo-task
        template: echo-template
        arguments:
          parameters:
          - name: index
            value: "{{item}}"
        withSequence:
          count: "{{workflow.parameters.number}}"
  - name: echo-template
    container:
      image: mirrors-ssl.aliyuncs.com/busybox:latest
      command: ["sh","-c"]
      args:
      - |
        echo "🚀 task {{inputs.parameters.index}} start"
        touch /scratch-volume/{{inputs.parameters.index}}-logs
        ls /scratch-volume
        echo "✅ task {{inputs.parameters.index}} done"
      volumeMounts:
      - name: scratch-volume
        mountPath: /scratch-volume
      resources:
        limits:
          cpu: "4"
          memory: 16Gi
        requests:
          cpu: "4"
          memory: 16Gi
      inputs:
        parameters:
        - name: index

Result Verification & Cleanup

Pod logs show the pre‑filled OSS files and a per‑task log file, confirming data isolation. After the workflow finishes, Argo automatically deletes the temporary PVCs; the associated cloud disks are reclaimed and billing stops.

Operational Recommendations

Compute placement : Run pre‑fill pods on Serverless pools or Spot instances to reduce cost.

Throughput throttling : Use the optional throughput field or a side‑car to cap MBps per task.

Reclaim policy : Set the StorageClass reclaimPolicy: Delete so disks are removed immediately after PVC deletion.

Deployment Checklist for ACK

Install components : Deploy Argo Workflows and the ACK storage‑operator in the cluster.

Enable features : In the storage‑operator configuration, set enableVolumePopulator: true and enableVolumePopulatorPodHandler: true.

RRSA permission : Grant the cluster‑level RRSA role that allows the populator controller to read from the specified OSS bucket.

References

Argo Workflows documentation: https://argoproj.github.io/workflows/

Kubernetes Volume Populator GA (v1.33): https://kubernetes.io/blog/2025/05/08/kubernetes-v1-33-volume-populators-ga/

Argo Volumes guide: https://argo-workflows.readthedocs.io/en/latest/walk-through/volumes/

Alibaba Cloud OSSVolumePopulator guide: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/prefetch-oss-data-into-high-performance-volumes-on-demand

Argo Workflows component on ACK: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/batch-task-orchestration/

ACK storage‑operator component: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/use-storage-operator-to-deploy-and-upgrade-storage-components

RRSA authentication guide: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/use-rrsa-to-authorize-pods-to-access-different-cloud-services

serverlessKubernetesArgo WorkflowsAlibaba Cloud ACKdata localizationVolumePopulator
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.