Deploy Kubeflow Pipelines on Alibaba Cloud with Kustomize: Step‑by‑Step Guide
Learn how to overcome the complexities of machine‑learning workflow management by installing Kubeflow Pipelines on Alibaba Cloud using Kustomize, including prerequisites, TLS setup, persistent storage with SSD disks, image registry replacement, and deployment verification steps.
Introduction
Machine‑learning projects often suffer from long workflow chains, uncontrolled data versions, difficult experiment tracking, and high model‑iteration cost. Kubeflow Pipelines (KFP) provides a portable, reproducible ML pipeline engine on Kubernetes.
What is Kubeflow Pipelines?
Management console for running and tracking experiments.
Argo‑based workflow engine that can execute multiple ML steps.
Python SDK for defining custom pipeline components.
Goals of Kubeflow Pipelines
End‑to‑end task orchestration : supports direct, scheduled, event‑driven, or data‑change triggers.
Simple experiment management : enables rapid iteration and smooth transition to production.
Component reuse : reusable pipelines and components accelerate solution building.
Running KFP on Alibaba Cloud
The official KFP distribution assumes Google Cloud services, which creates two challenges for Chinese users:
Complex default component set and Ksonnet‑based installation.
Deep coupling with Google Cloud prevents deployment on other clouds or bare‑metal.
Alibaba Cloud Container Service offers a Kustomize‑based deployment that replaces Google‑specific images with Alibaba mirrors and persists MySQL and MinIO data on SSD cloud disks.
Prerequisites
Install kustomize. See https://github.com/kubernetes-sigs/kustomize.git for the latest release.
Have a Kubernetes cluster created in Alibaba Cloud Container Service.
Installation Steps
SSH into the Kubernetes cluster (refer to Alibaba Cloud documentation for access).
Clone the deployment repository:
yum install -y git
git clone --recursive https://github.com/aliyunContainerService/kubeflow-aliyunGenerate a self‑signed TLS certificate (or use an existing one):
yum install -y openssl
domain="pipelines.kubeflow.org"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.key \
-out kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.crt \
-subj "/CN=$domain/O=$domain"If you already have a TLS cert, copy the key to kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.key and the cert to kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.crt.
Create an admin password for the UI:
yum install -y httpd-tools
htpasswd -c kubeflow-aliyun/overlays/ack-auto-clouddisk/auth adminGenerate the deployment YAML with Kustomize:
cd kubeflow-aliyun/
kustomize build overlays/ack-auto-clouddisk > /tmp/ack-auto-clouddisk.yamlReplace region and zone identifiers to match your cluster (example for Hangzhou):
sed -i.bak 's/regionid: cn-beijing/regionid: cn-hangzhou/g' /tmp/ack-auto-clouddisk.yaml
sed -i.bak 's/zoneid: cn-beijing-e/zoneid: cn-hangzhou-g/g' /tmp/ack-auto-clouddisk.yamlReplace the default Google container registry with Alibaba's mirror:
sed -i.bak 's/gcr.io/registry.aliyuncs.com/g' /tmp/ack-auto-clouddisk.yamlAdjust persistent‑disk size if needed (e.g., 200 Gi):
sed -i.bak 's/storage: 100Gi/storage: 200Gi/g' /tmp/ack-auto-clouddisk.yamlValidate the generated YAML:
kubectl create --validate=true --dry-run=true -f /tmp/ack-auto-clouddisk.yamlDeploy the pipelines: kubectl create -f /tmp/ack-auto-clouddisk.yaml Expose the UI via an Ingress and note the external IP (example: 112.124.193.271). Access the console at http://112.124.193.271/pipeline/ and log in with the admin credentials created earlier.
Verification
After deployment, list the Ingress to confirm the address: kubectl get ing -n kubeflow The UI should be reachable, allowing you to create and run ML pipelines.
Cleanup
To remove the installation, delete the resources and release the SSD disks:
kubectl delete -f /tmp/ack-auto-clouddisk.yaml
# Then release the associated cloud disks via the Alibaba Cloud console.Q&A
Why use Alibaba SSD cloud disks? They support automatic snapshots, protecting pipeline metadata.
How to back up the disks? Create manual snapshots or configure an automatic snapshot policy.
How to uninstall Kubeflow Pipelines? Delete the deployed resources (as shown above) and detach the disks.
Can I use an existing disk for the databases? Yes, refer to the documentation for mounting pre‑existing volumes.
Summary
This guide described the challenges of ML workflow management, introduced Kubeflow Pipelines, and provided a complete Kustomize‑based procedure to deploy KFP on Alibaba Cloud with TLS, persistent SSD storage, and a replaced container registry.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
