Cloud Native 12 min read

Comparative Study of Batch Compute and Serverless Argo Workflows for Containerized Data Processing

This article compares a cloud‑provider’s closed‑source Batch compute service with the open‑source, serverless Argo Workflows platform, demonstrating how each can orchestrate multi‑stage containerized data‑processing pipelines, detailing configuration, job definitions, dependency handling, and operational trade‑offs.

Alibaba Cloud Infrastructure

Oct 18, 2024

Comparative Study of Batch Compute and Serverless Argo Workflows for Containerized Data Processing

The rapid growth of containerization in batch processing for domains such as autonomous driving and scientific computing has led to two main solution families: proprietary Batch services offered by cloud providers and open‑source, Kubernetes‑native platforms built around Argo Workflows.

Case Study : A typical data‑processing workflow merges 128 files into 64, then 64 into 32, and finally computes results in a single pod, using a total of 97 pods across three stages.

Batch Compute Implementation

Batch is a fully managed service that runs containerized jobs at any scale. The workflow consists of creating job definitions, submitting jobs to a queue, and letting the Batch scheduler allocate CPU, memory, and GPU resources.

Job definition example (process‑data):

{
  "type": "container",
  "containerProperties": {
    "command": ["python", "process.py"],
    "image": "python:3.11-amd",
    "resourceRequirements": [
      {"type": "VCPU", "value": "1.0"},
      {"type": "MEMORY", "value": "2048"}
    ],
    "runtimePlatform": {"cpuArchitecture": "X86_64", "operatingSystemFamily": "LINUX"},
    "networkConfiguration": {"assignPublicIp": "DISABLED"},
    "executionRoleArn": "role::xxxxxxx"
  },
  "platformCapabilities": ["Serverless Container"],
  "jobDefinitionName": "process-data"
}

Jobs are submitted with array properties to launch multiple pods (e.g., 64 for the first stage, 32 for the second). Dependencies are expressed via the dependsOn field, and job IDs are retrieved after submission.

Argo Workflows Implementation

Serverless Argo Workflows is a fully managed Alibaba Cloud service built on the open‑source Argo project. It runs on Kubernetes using Elastic Container Instances (ECI) and supports complex dependency graphs, retries, and parallelism.

Workflow definition (YAML) example:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: process-data-
spec:
  entrypoint: main
  volumes:
    - name: workdir
      persistentVolumeClaim:
        claimName: pvc-oss
  arguments:
    parameters:
      - name: numbers
        value: "64"
  templates:
    - name: main
      steps:
        - - name: process-data-l1
            template: process-data
            arguments:
              parameters:
                - name: file_number
                  value: "{{item}}"
                - name: level
                  value: "1"
            withSequence:
              count: "{{workflow.parameters.numbers}}"
        - - name: process-data-l2
            template: process-data
            arguments:
              parameters:
                - name: file_number
                  value: "{{item}}"
                - name: level
                  value: "2"
            withSequence:
              count: "{{=asInt(workflow.parameters.numbers)/2}}"
        - - name: merge-data
            template: merge-data
            arguments:
              parameters:
                - name: number
                  value: "{{=asInt(workflow.parameters.numbers)/2}}"
    - name: process-data
      inputs:
        parameters:
          - name: file_number
          - name: level
      container:
        image: argo-workflows-registry.cn-hangzhou.cr.aliyuncs.com/argo-workflows-demo/python:3.11-amd
        command: [python3]
        args: ["process.py", "{{inputs.parameters.file_number}}", "{{inputs.parameters.level}}"]
        volumeMounts:
          - name: workdir
            mountPath: /mnt/vol
    - name: merge-data
      inputs:
        parameters:
          - name: number
      container:
        image: argo-workflows-registry.cn-hangzhou.cr.aliyuncs.com/argo-workflows-demo/python:3.11-amd
        command: [python3]
        args: ["merge.py", "0", "{{inputs.parameters.number}}"]
        volumeMounts:
          - name: workdir
            mountPath: /mnt/vol

The same workflow can also be built programmatically using the Python SDK, defining Container objects for each task and assembling them with Steps.

Comparison

Both Batch and Serverless Argo provide robust support for containerized batch workloads, but they differ in flexibility, vendor lock‑in, and control. Batch offers a tightly integrated, easy‑to‑use experience within the cloud provider’s ecosystem, while Argo Workflows gives greater customisation, portability across Kubernetes clusters, and fine‑grained dependency management.

Conclusion

Choosing between the two depends on the team’s familiarity with Kubernetes, the need for custom workflow logic, and the degree of dependence on a specific cloud vendor. For Kubernetes‑centric teams requiring high customisation, Argo Workflows is preferable; for teams seeking a simple, fully managed solution tightly coupled with other cloud services, Batch Compute is the better fit.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native kubernetes Workflow Orchestration Argo Workflows Batch Compute

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.