How Kubernetes Powers Scalable AI: Building an End‑to‑End Machine Learning Platform

This article explores how Kubernetes, enhanced by KubeSphere and serverless technologies, enables efficient AI workloads through GPU virtualization, multi‑cluster management, secure data sandboxes, automated testing, and scalable storage, illustrating a complete lifecycle from data ingestion to model inference.

Qingyun Technology Community
Qingyun Technology Community
Qingyun Technology Community
How Kubernetes Powers Scalable AI: Building an End‑to‑End Machine Learning Platform

Artificial Intelligence and Kubernetes

Predictions for 2021 consistently listed the tighter integration of AI with Kubernetes as a top trend because Kubernetes offers excellent scalability, distributed architecture, and powerful scheduling, making it an ideal platform for deep‑learning and machine‑learning workloads.

Prophecis Architecture

The Prophecis platform from WeBank runs on top of Kubernetes (purple layer) with a container‑management layer (KubeSphere) providing storage, networking, service governance, CI/CD, and observability.

Prophecis architecture
Prophecis architecture

Missing Native Capabilities for AI Workloads

User management and multi‑tenant permissions

Multi‑cluster management

Graphical GPU workload scheduling

GPU monitoring

Training and inference log management

Kubernetes events and audit

Alerting and notifications

Kubernetes alone does not provide these enterprise‑grade features, which are essential for a production‑ready ML platform.

KubeSphere as an Enterprise‑Grade Extension

KubeSphere sits on top of Kubernetes and adds user management, multi‑cluster control, observability, application management, micro‑service governance, and CI/CD, effectively turning Kubernetes into a modern distributed operating system.

Building the Jizhan AI Platform

The platform offers end‑to‑end AI lifecycle management: data processing, model training, testing, and inference, with low‑code development, automated testing, intelligent scheduling, and resource monitoring to boost efficiency and reduce costs.

Challenges Before Refactoring

Low GPU utilization during development (average 50% waste).

High storage operational cost with Ceph.

Data‑set security for confidential data.

High manual effort for algorithm testing.

Solutions Implemented

Adopted KubeSphere to abstract Kubernetes complexities.

Replaced Ceph with QingStor NeonSAN (NVMe SSD + 25 GbE RDMA) achieving 5‑6× IOPS improvement.

Implemented a data‑security sandbox to isolate datasets while allowing algorithm training.

Developed EVSdk for unified algorithm packaging, input standardization, and automated testing.

GPU Virtualization

Used Tencent’s open‑source GPUManager to virtualize GPUs, limiting each container’s GPU usage with only ~5% performance overhead and enabling multiple containers to share a single GPU safely.

resources:
  requests:
    nvidia.com/gpu: 2
    cpu: 8
    memory: 16Gi
  limits:
    nvidia.com/gpu: 2
    cpu: 8
    memory: 16Gi
GPU virtualization diagram
GPU virtualization diagram

Training Cluster Scheduling

Jobs are created with explicit GPU requests; combined with a message queue, the training cluster achieves near‑full GPU utilization.

Resource Monitoring

KubeSphere’s custom monitoring panels track CPU, GPU, and project‑level usage, allowing administrators to set quotas per project and per user.

GPU resource monitoring
GPU resource monitoring

Secure Data Sandbox

The sandbox isolates external clusters from the internet, preventing data leakage while permitting controlled data transfer to developer environments via network policies.

Automated Testing Framework

EVSdk defines a unified algorithm interface, standardizes inputs, and supports multiple model formats. Templates and routing paths extract specific fields (e.g., age) from JSON/XML outputs for comparison against ground‑truth annotations.

route_path: $.people[0].age.value
Automated testing flow
Automated testing flow

Serverless for AI

AI workloads benefit from serverless by reducing data‑processing costs, triggering training jobs on events, serving models as functions, and handling inference results via event‑driven functions.

OpenFunction Overview

OpenFunction is an open‑source cloud‑native FaaS platform built on top of Kubernetes. It consists of Build (converts code to container images), Serving (scalable function execution), and Events (connects external event sources).

OpenFunction architecture
OpenFunction architecture

OpenFunction leverages Cloud Native Buildpacks, Dapr, Knative Serving, and KEDA to provide both synchronous and asynchronous function runtimes, with extensible event sources (Kafka, NATS, PubSub, S3, GitHub) and customizable event buses.

EventSource: integrates external event producers.

EventBus: pluggable message‑queue backbone.

Trigger: filters events and invokes functions.

Future Outlook

The roadmap includes tighter GPU scheduling support in KubeSphere v3.2, a plug‑in architecture in v4.0, and industry‑specific low‑code suites that let end users adapt algorithms to their own data without writing code.

serverlessmachine learningAIKubernetesGPU virtualizationKubeSphere
Qingyun Technology Community
Written by

Qingyun Technology Community

Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.