Cloud Native 9 min read

Orchestrating Gene Computation Workflows with Argo Workflows

This article explains how to use the Kubernetes-native Argo Workflows engine to automate and scale complex gene-computing pipelines, detailing its advantages, challenges, and a step-by-step BWA alignment workflow example on Alibaba Cloud’s ACK platform.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Orchestrating Gene Computation Workflows with Argo Workflows

In the data‑intensive field of gene computing, researchers face exploding data volumes and the need for efficient, reproducible analysis; workflow automation with a container‑friendly engine like Argo Workflows becomes essential for linking each analysis step.

A gene‑computing workflow strings together tasks such as data preprocessing, sequence alignment, variant detection, expression analysis, and phylogenetic tree construction into an ordered pipeline.

Argo Workflows, an open‑source Kubernetes‑native engine, offers containerization for environment consistency, flexible DAG‑based orchestration, conditional logic, and parallel execution, making it well‑suited for the complex, multi‑stage pipelines typical in genomics.

However, large‑scale deployment brings challenges: users may lack deep cluster‑ops expertise, massive job counts strain open‑source engines, and resource‑aware scaling and elasticity remain difficult.

Alibaba Cloud’s ACK One team addresses these issues with a serverless, distributed Argo cluster that runs on Elastic Container Instances (ECI), optimizes Kubernetes parameters for high‑throughput scheduling, and leverages preemptible instances to reduce cost.

Below is a concrete BWA alignment workflow example. After creating a distributed Argo cluster and mounting an OSS volume, the following YAML defines three stages—bwaprepare (data download and indexing), bwamap (parallel alignment), and bwaindex (SAM/BAM processing and visualization). The DAG ties the stages together, enabling retries and parallelism.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: bwa-oss-
spec:
  entrypoint: bwa-oss
  arguments:
    parameters:
    - name: fastqFolder
      value: /gene
    - name: reference
      value: https://ags-public.oss-cn-beijing.aliyuncs.com/alignment/subset_assembly.fa.gz
    - name: fastq1
      value: https://ags-public.oss-cn-beijing.aliyuncs.com/alignment/SRR1976948_1.fastq.gz
    - name: fastq2
      value: https://ags-public.oss-cn-beijing.aliyuncs.com/alignment/SRR1976948_2.fastq.gz
  volumes:
  - name: ossdir
    persistentVolumeClaim:
      claimName: pvc-oss
  templates:
  - name: bwaprepare
    container:
      image: registry.cn-beijing.aliyuncs.com/geno/alltools:v0.2
      command: [sh,-c]
      args:
      - mkdir -p /bwa{{workflow.parameters.fastqFolder}}; cd /bwa{{workflow.parameters.fastqFolder}}; rm -rf SRR1976948*; wget {{workflow.parameters.reference}}; wget {{workflow.parameters.fastq1}}; wget {{workflow.parameters.fastq2}}; gzip -d subset_assembly.fa.gz; gunzip -c SRR1976948_1.fastq.gz | head -800000 > SRR1976948.1; gunzip -c SRR1976948_2.fastq.gz | head -800000 > SRR1976948.2; bwa index subset_assembly.fa
      volumeMounts:
      - name: ossdir
        mountPath: /bwa
    retryStrategy:
      limit: 3
  - name: bwamap
    inputs:
      parameters:
      - name: object
    container:
      image: registry.cn-beijing.aliyuncs.com/geno/alltools:v0.2
      command: [sh,-c]
      args:
      - cd /bwa{{workflow.parameters.fastqFolder}}; bwa aln subset_assembly.fa {{inputs.parameters.object}} > {{inputs.parameters.object}}.untrimmed.sai
      volumeMounts:
      - name: ossdir
        mountPath: /bwa
    retryStrategy:
      limit: 3
  - name: bwaindex
    container:
      image: registry.cn-beijing.aliyuncs.com/geno/alltools:v0.2
      command: [sh,-c]
      args:
      - cd /bwa{{workflow.parameters.fastqFolder}}; bwa sampe subset_assembly.fa SRR1976948.1.untrimmed.sai SRR1976948.2.untrimmed.sai SRR1976948.1 SRR1976948.2 > SRR1976948.untrimmed.sam; samtools import subset_assembly.fa SRR1976948.untrimmed.sam SRR1976948.untrimmed.sam.bam; samtools sort SRR1976948.untrimmed.sam.bam -o SRR1976948.untrimmed.sam.bam.sorted.bam; samtools index SRR1976948.untrimmed.sam.bam.sorted.bam; samtools tview SRR1976948.untrimmed.sam.bam.sorted.bam subset_assembly.fa -p k99_13588:1000 -d T
      volumeMounts:
      - name: ossdir
        mountPath: /bwa
    retryStrategy:
      limit: 3
  - name: bwa-oss
    dag:
      tasks:
      - name: bwaprepare
        template: bwaprepare
      - name: bwamap
        template: bwamap
        dependencies: [bwaprepare]
        arguments:
          parameters:
          - name: object
            value: "{{item}}"
        withItems: ["SRR1976948.1","SRR1976948.2"]
      - name: bwaindex
        template: bwaindex
        dependencies: [bwamap]

After submitting the workflow, users can check its status via the ACK console; successful completion yields alignment files stored in the OSS bucket, as shown in the accompanying screenshots.

Overall, Argo Workflows’ containerized, flexible, and easy‑to‑use nature makes it a powerful tool for gene‑computing and other data‑intensive scientific domains, improving automation, resource utilization, and analysis throughput.

cloud nativeKubernetesWorkflow AutomationbioinformaticsArgo WorkflowsGene Computing
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.