Artificial Intelligence 6 min read

Introduction to Kubeflow and Its Installation Process

This article introduces Kubeflow, explains the typical machine‑learning model lifecycle, outlines Kubeflow’s core components and Kubernetes advantages, provides detailed server and storage configuration, walks through ksonnet and Kubeflow installation steps, and shows how to verify deployments and access the Kubeflow UI.

360 Tech Engineering
360 Tech Engineering
360 Tech Engineering
Introduction to Kubeflow and Its Installation Process

Kubeflow is a Google‑initiated machine‑learning platform built on Kubernetes that enables definition of resources such as TFJob and allows distributed model training to be managed like regular application deployments.

Before Kubeflow, a production ML workflow typically passes through data cleaning and validation, dataset splitting, training, model validation, large‑scale training, model export, service deployment, and monitoring, requiring additional tooling for data collection, feature extraction, resource management, and logging.

Core Kubeflow components include:

Jupyter multi‑tenant Notebook service

TensorFlow/PyTorch as supported ML engines

Seldon for model deployment on Kubernetes

TF‑Serving for online TensorFlow model serving with version control

Argo workflow engine

Ambassador API gateway

Istio for service management and telemetry

Ksonnet for deploying Kubernetes resources

Kubeflow leverages Kubernetes advantages such as native resource isolation, automated cluster management, CPU/GPU scheduling, support for distributed storage, and mature monitoring and alerting.

Server configuration

GPU: Nvidia‑Tesla‑K80

Network: 1 GbE (note: may become a bottleneck for large datasets)

CephFS service configuration

Network: 10 GbE (Ceph clusters should be co‑located with Kubernetes to avoid high latency).

Kubeflow installation prerequisites

Kubernetes version: v1.12.2 (kube‑dns required)

Kubeflow version: v0.3.2

Jsonnet version: v0.11.2

Install ksonnet

(Installation steps shown in the accompanying image.)

Install Kubeflow

After completing all installation steps, verify the deployment status of Kubeflow resources in the Kubernetes cluster. Check the status of each Deployment and its Pods to ensure they are running correctly.

Use Ambassador as the unified external gateway; forward its service port with kubectl port-forward to access Kubeflow locally.

Access the Kubeflow UIs via localhost:8080 in a browser. The UIs allow you to use Jupyter Notebook for end‑to‑end development, run code, view results, and launch TF‑operator for distributed TensorFlow training.

Conclusion

Major cloud providers and hardware vendors are investing in Kubeflow to enable large‑scale, multi‑GPU training on Kubernetes, improving GPU utilization and streamlining the ML workflow, while presenting new challenges for DevOps teams.

machine learningmodel deploymentKubernetesAI PlatformKubeflow
360 Tech Engineering
Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.