How Argo Workflows Tame Unpredictable AI Agents for Scalable Production
At KubeCon NA, experts showed that combining deterministic Argo Workflows with large‑model AI agents lets teams orchestrate smart, flexible agents in a predictable, observable, and auditable way, enabling large‑scale CVE remediation and self‑healing operations on Kubernetes.
Background
Large‑model AI agents are increasingly deployed in production, but their probabilistic outputs make control, observability and auditability difficult. At KubeCon North America, practitioners demonstrated that the deterministic nature of Argo Workflows can be used to contain and manage the uncertainty of AI agents.
Key Concepts
Agents represent uncertainty : they generate probabilistic results, excel at exploring ambiguous problems, and are not 100 % predictable.
Workflows represent determinism : they define explicit steps, ordering, conditions, retries and rollback, turning a task into a standardized, observable, auditable pipeline.
Workflow‑Orchestrated Agents (JFrog & Root.io)
In a CVE remediation pipeline, a scheduled Argo Workflow triggers a research-agent template. The agent receives input parameters (CVE list, model version, environment variables) via the workflow’s inputs section, performs vulnerability discovery, analysis, and generates a report. Subsequent steps—packaging, container image rebuild, and deployment—are defined as separate workflow nodes. Failure handling is expressed with retryStrategy and onExit hooks that automatically roll back or retry failed agents.
Typical workflow snippet:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: cve‑remediation-
spec:
entrypoint: main
templates:
- name: main
steps:
- - name: discover
template: research-agent
- - name: package
template: pack
- - name: deploy
template: deploy
- name: research-agent
container:
image: myorg/research-agent:{{inputs.parameters.model}}
inputs:
parameters:
- name: cve-list
- name: model
outputs:
parameters:
- name: report
valueFrom:
path: /tmp/report.json
retryStrategy:
limit: 3
retryPolicy: "Always"Agent‑Orchestrated Workflows (Salesforce)
Salesforce operates over 1,400 Kubernetes clusters and millions of pods. They built a multi‑agent system (On‑Call, Kubectl, Analysis agents) that evaluates alerts, queries historical metrics, and decides which operational action to take. Rather than letting agents execute kubectl commands directly, each decision triggers a predefined Argo Workflow that performs the concrete operation (pod restart, config update, node scaling). This design enforces RBAC, audit logging, and deterministic rollback.
Example agent decision logic (pseudo‑code):
if alert.severity >= HIGH:
action = "restart-pod"
else:
action = "scale-node"
schedule_workflow(action, parameters)Corresponding workflow template:
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: pod-restart
spec:
entrypoint: restart
templates:
- name: restart
container:
image: bitnami/kubectl
command: ["kubectl", "rollout", "restart", "deployment/{{inputs.parameters.deployment}}"]
inputs:
parameters:
- name: deploymentArgo Workflows Overview
Argo Workflows is an open‑source, container‑native workflow engine for Kubernetes. It supports DAG and step‑based execution, parallelism, artifact passing, and built‑in retry/timeout policies. Workflows are defined as Kubernetes custom resources, versioned with GitOps, and can be executed server‑side or via the CLI ( argo submit).
Practical Takeaways
Encapsulating AI agents inside deterministic Argo Workflows provides predictable execution, observability (via workflow logs and metrics), and auditability (workflow manifests are immutable).
Both orchestration directions are viable: workflows can invoke agents as container steps, and agents can act as decision engines that schedule workflows.
Key implementation patterns include: passing model version and parameters through inputs, using retryStrategy for transient AI failures, and defining reusable workflow templates for common operational actions.
References
Argo Workflows GitHub – https://github.com/argoproj/argo-workflows
KubeCon NA session “GitOps for AI Agents” – https://kccncna2025.sched.com/event/27FfB/gitops-for-ai-agents-building-reliable-ai-pipelines-with-argo-benji-kalman-rootio-shiran-melamed-jfrog
Salesforce self‑healing AIOps talk – https://kccncna2025.sched.com/event/27FVk/1000-clusters-1-brain-salesforces-approach-to-self-healing-using-aiops
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
