Cloud Native 9 min read

Deploy Ollama and Open-WebUI on Kubernetes: A Step‑by‑Step Guide

This article walks through deploying the open‑source LLM serving tool Ollama and the Open‑WebUI interface on a Kubernetes cluster using Helm, covering GPU considerations, configuration files, service exposure, and model management with practical code examples.

Ops Development Stories

Jul 19, 2024

Deploy Ollama and Open-WebUI on Kubernetes: A Step‑by‑Step Guide

Since the beginning of this year, interest in large language models (LLM) and their deployment on GPU infrastructure has surged, driven by advances in artificial intelligence and machine learning that demand massive compute power. Nvidia’s stock rose, and many large models have emerged; for deploying and managing these models, Ollama and Open‑WebUI are solid choices.

Ollama is an open‑source tool for deploying machine‑learning models, simplifying the management and interaction with LLMs. It offers top‑tier open‑source models such as Llama 3, Phi 3, Mistral, and can be thought of as a Docker‑like platform focused on machine‑learning models.

Deploying a model with Ollama is as easy as using Docker, but the CLI can be cumbersome for newcomers. To alleviate this, the Open‑WebUI project provides a friendly web interface that makes model deployment easier.

For better management, Ollama can be deployed onto a Kubernetes cluster, giving high availability and scalability. A Kubernetes cluster with GPU is ideal, but even without GPU the Llama 3 model runs reasonably well on CPU.

$ kubectl version
Client Version: v1.28.11
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.7

Deploy Ollama to Kubernetes

The Open‑WebUI project supplies a Helm chart that simplifies installing Ollama and Open‑WebUI. Add the chart repository with:

helm repo add open-webui https://helm.openwebui.com/
helm repo update

The chart deploys Ollama by default; you can customize settings such as GPU usage, data persistence, and more by providing your own values file.

# myvalues.yaml
ollama:
  enabled: true  # automatically install Ollama Helm chart
  ollama:
    gpu:
      enabled: false  # set to true to use GPU
  persistentVolume:
    enabled: true
    storageClass: nfs-client  # specify storage class
pipelines:
  enabled: true
  persistence:
    enabled: true
    storageClass: "nfs-client"
service:
  type: NodePort
extraEnvVars:
  - name: HF_ENDPOINT
    value: https://hf-mirror.com

In this configuration you can toggle GPU support, enable persistent storage, and set the Service type to NodePort so the Open‑WebUI can be accessed via the node’s IP and port. You may also configure an Ingress if preferred.

Note: Open‑WebUI by default contacts the HuggingFace model hub, which is inaccessible from mainland China. Set the HF_ENDPOINT environment variable to a mirror address (e.g., https://hf-mirror.com ) to avoid errors.

Install the chart with your custom values:

helm upgrade --install ollama open-webui/open-webui -f myvalues.yaml --create-namespace --namespace kube-ai

After deployment, several pods run in the kube-ai namespace. Check their status with:

$ kubectl get pods -n kube-ai
NAME                                 READY   STATUS    RESTARTS   AGE
open-webui-0                          1/1     Running   0          2m11s
open-webui-ollama-944dd68fc-wxsjf     1/1     Running   0          24h
open-webui-pipelines-557f6f95cd-dfgh8 1/1    Running   0          25h

Because the Service is of type NodePort, you can reach Open‑WebUI via http://NodeIP:31009:

$ kubectl get svc -n kube-ai
NAME               TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)               AGE
open-webui         NodePort   10.96.1.212   <none>        80:31009/TCP         25h
open-webui-ollama  ClusterIP  10.96.2.112   <none>        11434/TCP            25h
open-webui-pipelines NodePort 10.96.2.170   <none>        9099:32322/TCP       25h

Usage

Open the URL http://NodeIP:31009 in a browser. Register a new account to access the Open‑WebUI homepage.

If you already have an Ollama instance elsewhere, you can add it as an external connection. Configure the Ollama address in the admin panel under Settings → External Connections, then set the Ollama API URL (the Helm deployment already provides this URL).

Switch to the Models tab to pull models from the Ollama library (e.g., llama3) via https://ollama.com/library. Click the pull button, monitor download progress, and once completed, select the model for chat interactions.

Conclusion

This guide demonstrated how to deploy the Llama 3 model on a Kubernetes cluster using Ollama and Open‑WebUI. By containerizing and orchestrating the AI service, we achieved a scalable, maintainable environment with minimal manual intervention. The combination of Open‑WebUI’s simple UI and Kubernetes’ powerful automation positions this stack as a key solution for future AI‑driven applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Deployment Kubernetes GPU Ollama helm Open WebUI

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.