How Triton Enables Seamless Container Deployment on Alibaba Cloud ACK
This article explains how Zhangmen built the Triton platform on Alibaba Cloud ACK and OpenKruise to automate Kubernetes container releases, detailing the design of the DeployFlow CRD, batch‑wise rollout strategies, canary deployments, UI interactions, and the operators that power continuous delivery.
Background
Kubernetes and containers are now mainstream in production environments. To reduce infrastructure cost and improve delivery speed, a containerization project was launched in April 2020 using Alibaba Cloud Container Service (ACK) as the Kubernetes runtime and the open‑source OpenKruise project as the workload engine.
Workload Choice
The platform, named Triton , adopts OpenKruise CloneSet as the primary workload type. CloneSet extends native Deployments with in‑place updates, ordered rollout, and parallel or canary strategies, supporting both stateless and stateful services.
DeployFlow CRD Design
The core resource is the DeployFlow custom resource definition (CRD) that describes a release configuration and its runtime status.
type DeployFlow struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec DeployFlowSpec `json:"spec,omitempty"`
Status DeployFlowStatus `json:"status,omitempty"`
} DeployFlowSpeccontains two groups of fields:
Application metadata (AppID, GroupID, AppName, replica count, etc.)
Release strategy parameters (action, UpdateStrategy, NonUpdateStrategy, etc.)
type DeployFlowSpec struct {
AppID int `json:"appID"`
GroupID int `json:"groupID"`
AppName string `json:"appName"`
// ... other fields ...
Action string `json:"action"`
UpdateStrategy *DeployUpdateStrategy `json:"updateStrategy,omitempty"`
NonUpdateStrategy *DeployNonUpdateStrategy `json:"nonUpdateStrategy,omitempty"`
}UpdateStrategy handles operations that modify the CloneSet’s UpdateRevision (create, update, rollback). NonUpdateStrategy covers actions that do not change the revision, such as scaling or restarting pods.
BaseStrategy Fields
BatchSize: maximum number of pods per rollout batch. Paused / Canceled: allow pausing, resuming, or aborting a release. Mode: auto or manual batch triggering. BatchIntervalSeconds: interval between automatic batches.
Batch Phase Definitions
BatchPending : Pods are being scheduled or images are pulling.
BatchSmoking : Pods are running but not yet ready.
BatchSmoked : All pods are running; containers are not marked Ready, so traffic is not routed.
BatchBaking : Pods are gradually added to service endpoints (traffic pull‑in).
BatchBaked : Pods are fully Ready and receiving production traffic.
SmokeFailed / BakeFailed : Failure during smoking or baking.
When BaseStrategy.canary is set, Triton pauses after BatchSmoked , allowing manual verification before proceeding to BatchBaking , thus implementing a classic canary deployment.
Architecture Overview
Triton runs on top of ACK and OpenKruise. The DeployFlow CRD is processed by a custom Operator that implements the release logic. Additional controllers include:
Event controller – forwards logs from Pods, CloneSets, and DeployFlows to Elasticsearch.
ReadinessGates controller – manages custom readiness gates for traffic pull‑in/pull‑out.
REST and gRPC APIs are exposed for external integration.
Graceful Shutdown
Kubernetes terminates pods with terminationGracePeriodSeconds (default 30 s). Triton adds a preStop hook that runs /gracefully_shutdown to deregister the service from external registries before the container is killed.
spec:
containers:
- name: demo-container
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "/gracefully_shutdown"]Release Process
Canary Smoking : Deploy the canary batch and verify functionality.
Canary Baking : Pull traffic to the canary while the release remains paused for observation.
Rollout : Continue with remaining batches, either automatically or manually.
Completion : All batches reach BatchBaked and the release status becomes successful.
The UI updates in real time, showing pod counts for each phase and allowing manual triggers when in manual mode.
Targeted Pod Operations
Triton supports safe pod restarts and scaling by leveraging CloneSet’s ability to add a new pod, verify its health, and then delete the target pod, avoiding the blunt kubectl delete pod approach.
Summary and Outlook
Triton has become a core capability of the continuous‑delivery platform, handling thousands of container migrations and supporting large‑scale pipelines. Current limitations include:
Handling long‑lived socket connections.
Cross‑application release orchestration.
Faster provisioning of local test environments.
Ongoing work aims to address these gaps.
References
https://zh.wikipedia.org/wiki/%E7%89%B9%E9%87%8C%E5%90%8C
https://mp.weixin.qq.com/s/DFy_E6qN3hLyStaSand_Dg
https://www.aliyun.com/product/kubernetes
https://docs.google.com/document/d/1gMhRz4vEwiHa3uD8DqFKHGTSxrVJNgkLG2WZWvi9lXo/edit#
https://openkruise.io/zh-cn/docs/what_is_openkruise.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
