Deploy Kubeflow Pipelines on Alibaba Cloud with Kustomize: Step‑by‑Step Guide

Learn how to overcome the complexities of machine‑learning workflow management by installing Kubeflow Pipelines on Alibaba Cloud using Kustomize, including prerequisites, TLS setup, persistent storage with SSD disks, image registry replacement, and deployment verification steps.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Deploy Kubeflow Pipelines on Alibaba Cloud with Kustomize: Step‑by‑Step Guide

Introduction

Machine‑learning projects often suffer from long workflow chains, uncontrolled data versions, difficult experiment tracking, and high model‑iteration cost. Kubeflow Pipelines (KFP) provides a portable, reproducible ML pipeline engine on Kubernetes.

What is Kubeflow Pipelines?

Management console for running and tracking experiments.

Argo‑based workflow engine that can execute multiple ML steps.

Python SDK for defining custom pipeline components.

Goals of Kubeflow Pipelines

End‑to‑end task orchestration : supports direct, scheduled, event‑driven, or data‑change triggers.

Simple experiment management : enables rapid iteration and smooth transition to production.

Component reuse : reusable pipelines and components accelerate solution building.

Running KFP on Alibaba Cloud

The official KFP distribution assumes Google Cloud services, which creates two challenges for Chinese users:

Complex default component set and Ksonnet‑based installation.

Deep coupling with Google Cloud prevents deployment on other clouds or bare‑metal.

Alibaba Cloud Container Service offers a Kustomize‑based deployment that replaces Google‑specific images with Alibaba mirrors and persists MySQL and MinIO data on SSD cloud disks.

Prerequisites

Install kustomize. See https://github.com/kubernetes-sigs/kustomize.git for the latest release.

Have a Kubernetes cluster created in Alibaba Cloud Container Service.

Installation Steps

SSH into the Kubernetes cluster (refer to Alibaba Cloud documentation for access).

Clone the deployment repository:

yum install -y git
git clone --recursive https://github.com/aliyunContainerService/kubeflow-aliyun

Generate a self‑signed TLS certificate (or use an existing one):

yum install -y openssl
domain="pipelines.kubeflow.org"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.key \
  -out kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.crt \
  -subj "/CN=$domain/O=$domain"

If you already have a TLS cert, copy the key to kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.key and the cert to kubeflow-aliyun/overlays/ack-auto-clouddisk/tls.crt.

Create an admin password for the UI:

yum install -y httpd-tools
htpasswd -c kubeflow-aliyun/overlays/ack-auto-clouddisk/auth admin

Generate the deployment YAML with Kustomize:

cd kubeflow-aliyun/
kustomize build overlays/ack-auto-clouddisk > /tmp/ack-auto-clouddisk.yaml

Replace region and zone identifiers to match your cluster (example for Hangzhou):

sed -i.bak 's/regionid: cn-beijing/regionid: cn-hangzhou/g' /tmp/ack-auto-clouddisk.yaml
sed -i.bak 's/zoneid: cn-beijing-e/zoneid: cn-hangzhou-g/g' /tmp/ack-auto-clouddisk.yaml

Replace the default Google container registry with Alibaba's mirror:

sed -i.bak 's/gcr.io/registry.aliyuncs.com/g' /tmp/ack-auto-clouddisk.yaml

Adjust persistent‑disk size if needed (e.g., 200 Gi):

sed -i.bak 's/storage: 100Gi/storage: 200Gi/g' /tmp/ack-auto-clouddisk.yaml

Validate the generated YAML:

kubectl create --validate=true --dry-run=true -f /tmp/ack-auto-clouddisk.yaml

Deploy the pipelines: kubectl create -f /tmp/ack-auto-clouddisk.yaml Expose the UI via an Ingress and note the external IP (example: 112.124.193.271). Access the console at http://112.124.193.271/pipeline/ and log in with the admin credentials created earlier.

Verification

After deployment, list the Ingress to confirm the address: kubectl get ing -n kubeflow The UI should be reachable, allowing you to create and run ML pipelines.

Cleanup

To remove the installation, delete the resources and release the SSD disks:

kubectl delete -f /tmp/ack-auto-clouddisk.yaml
# Then release the associated cloud disks via the Alibaba Cloud console.

Q&A

Why use Alibaba SSD cloud disks? They support automatic snapshots, protecting pipeline metadata.

How to back up the disks? Create manual snapshots or configure an automatic snapshot policy.

How to uninstall Kubeflow Pipelines? Delete the deployed resources (as shown above) and detach the disks.

Can I use an existing disk for the databases? Yes, refer to the documentation for mounting pre‑existing volumes.

Summary

This guide described the challenges of ML workflow management, introduced Kubeflow Pipelines, and provided a complete Kustomize‑based procedure to deploy KFP on Alibaba Cloud with TLS, persistent SSD storage, and a replaced container registry.

Kubeflow Pipelines architecture diagram
Kubeflow Pipelines architecture diagram
Kubeflow Pipelines UI screenshot
Kubeflow Pipelines UI screenshot
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningKubernetesAlibaba CloudKubeflowKustomizePipelines
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.