Orchestrating Gene Computation Workflows with Argo Workflows
This article explains how to use the Kubernetes-native Argo Workflows engine to automate and scale complex gene-computing pipelines, detailing its advantages, challenges, and a step-by-step BWA alignment workflow example on Alibaba Cloud’s ACK platform.
In the data‑intensive field of gene computing, researchers face exploding data volumes and the need for efficient, reproducible analysis; workflow automation with a container‑friendly engine like Argo Workflows becomes essential for linking each analysis step.
A gene‑computing workflow strings together tasks such as data preprocessing, sequence alignment, variant detection, expression analysis, and phylogenetic tree construction into an ordered pipeline.
Argo Workflows, an open‑source Kubernetes‑native engine, offers containerization for environment consistency, flexible DAG‑based orchestration, conditional logic, and parallel execution, making it well‑suited for the complex, multi‑stage pipelines typical in genomics.
However, large‑scale deployment brings challenges: users may lack deep cluster‑ops expertise, massive job counts strain open‑source engines, and resource‑aware scaling and elasticity remain difficult.
Alibaba Cloud’s ACK One team addresses these issues with a serverless, distributed Argo cluster that runs on Elastic Container Instances (ECI), optimizes Kubernetes parameters for high‑throughput scheduling, and leverages preemptible instances to reduce cost.
Below is a concrete BWA alignment workflow example. After creating a distributed Argo cluster and mounting an OSS volume, the following YAML defines three stages—bwaprepare (data download and indexing), bwamap (parallel alignment), and bwaindex (SAM/BAM processing and visualization). The DAG ties the stages together, enabling retries and parallelism.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: bwa-oss-
spec:
entrypoint: bwa-oss
arguments:
parameters:
- name: fastqFolder
value: /gene
- name: reference
value: https://ags-public.oss-cn-beijing.aliyuncs.com/alignment/subset_assembly.fa.gz
- name: fastq1
value: https://ags-public.oss-cn-beijing.aliyuncs.com/alignment/SRR1976948_1.fastq.gz
- name: fastq2
value: https://ags-public.oss-cn-beijing.aliyuncs.com/alignment/SRR1976948_2.fastq.gz
volumes:
- name: ossdir
persistentVolumeClaim:
claimName: pvc-oss
templates:
- name: bwaprepare
container:
image: registry.cn-beijing.aliyuncs.com/geno/alltools:v0.2
command: [sh,-c]
args:
- mkdir -p /bwa{{workflow.parameters.fastqFolder}}; cd /bwa{{workflow.parameters.fastqFolder}}; rm -rf SRR1976948*; wget {{workflow.parameters.reference}}; wget {{workflow.parameters.fastq1}}; wget {{workflow.parameters.fastq2}}; gzip -d subset_assembly.fa.gz; gunzip -c SRR1976948_1.fastq.gz | head -800000 > SRR1976948.1; gunzip -c SRR1976948_2.fastq.gz | head -800000 > SRR1976948.2; bwa index subset_assembly.fa
volumeMounts:
- name: ossdir
mountPath: /bwa
retryStrategy:
limit: 3
- name: bwamap
inputs:
parameters:
- name: object
container:
image: registry.cn-beijing.aliyuncs.com/geno/alltools:v0.2
command: [sh,-c]
args:
- cd /bwa{{workflow.parameters.fastqFolder}}; bwa aln subset_assembly.fa {{inputs.parameters.object}} > {{inputs.parameters.object}}.untrimmed.sai
volumeMounts:
- name: ossdir
mountPath: /bwa
retryStrategy:
limit: 3
- name: bwaindex
container:
image: registry.cn-beijing.aliyuncs.com/geno/alltools:v0.2
command: [sh,-c]
args:
- cd /bwa{{workflow.parameters.fastqFolder}}; bwa sampe subset_assembly.fa SRR1976948.1.untrimmed.sai SRR1976948.2.untrimmed.sai SRR1976948.1 SRR1976948.2 > SRR1976948.untrimmed.sam; samtools import subset_assembly.fa SRR1976948.untrimmed.sam SRR1976948.untrimmed.sam.bam; samtools sort SRR1976948.untrimmed.sam.bam -o SRR1976948.untrimmed.sam.bam.sorted.bam; samtools index SRR1976948.untrimmed.sam.bam.sorted.bam; samtools tview SRR1976948.untrimmed.sam.bam.sorted.bam subset_assembly.fa -p k99_13588:1000 -d T
volumeMounts:
- name: ossdir
mountPath: /bwa
retryStrategy:
limit: 3
- name: bwa-oss
dag:
tasks:
- name: bwaprepare
template: bwaprepare
- name: bwamap
template: bwamap
dependencies: [bwaprepare]
arguments:
parameters:
- name: object
value: "{{item}}"
withItems: ["SRR1976948.1","SRR1976948.2"]
- name: bwaindex
template: bwaindex
dependencies: [bwamap]After submitting the workflow, users can check its status via the ACK console; successful completion yields alignment files stored in the OSS bucket, as shown in the accompanying screenshots.
Overall, Argo Workflows’ containerized, flexible, and easy‑to‑use nature makes it a powerful tool for gene‑computing and other data‑intensive scientific domains, improving automation, resource utilization, and analysis throughput.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.