Deploying Ollama and Open‑WebUI on Kubernetes with Helm
This guide explains how to deploy the open‑source LLM serving tool Ollama and the Open‑WebUI front‑end on a Kubernetes cluster using Helm charts, covering GPU configuration, persistent storage, service exposure, and model selection for large language models.
Since the rise of large language models (LLMs) and the demand for GPU‑accelerated inference, tools like Ollama and Open‑WebUI have become popular for deploying and managing these models in production.
Ollama is an open‑source model serving platform that bundles popular models such as Llama 3, Phi 3, and Mistral, offering a Docker‑like experience focused on machine‑learning workloads.
To simplify deployment, the Open‑WebUI project provides a Helm chart hosted at https://helm.openwebui.com . Adding the repository and updating it is done with:
helm repo add open-webui https://helm.openwebui.com/
helm repo updateThe chart can be customized via a myvalues.yaml file. An example configuration disables GPU, enables a persistent NFS volume, and sets the service type to NodePort :
# myvalues.yaml
ollama:
enabled: true
ollama:
gpu:
enabled: false
persistentVolume:
enabled: true
storageClass: nfs-client
pipelines:
enabled: true
persistence:
enabled: true
storageClass: "nfs-client"
service:
type: NodePort
extraEnvVars:
- name: HF_ENDPOINT
value: https://hf-mirror.comDeploy the chart with:
helm upgrade --install ollama open-webui/open-webui -f myvalues.yaml --create-namespace --namespace kube-aiAfter installation, check the pods in the kube-ai namespace:
$ kubectl get pods -n kube-ai
NAME READY STATUS RESTARTS AGE
open-webui-0 1/1 Running 0 2m11s
open-webui-ollama-944dd68fc-wxsjf 1/1 Running 0 24h
open-webui-pipelines-557f6f95cd-dfgh8 1/1 Running 0 25hThe NodePort service exposes Open‑WebUI at http:// :31009 . Access the UI, register an account, and configure the Ollama API endpoint (automatically set by the chart).
In the UI, add external connections if Ollama runs elsewhere, then navigate to the Model tab to pull models from https://ollama.com/library (e.g., llama3 ). After the download completes, select the model to start chatting.
In summary, by containerizing Ollama and Open‑WebUI and orchestrating them with Kubernetes and Helm, you obtain a scalable, maintainable AI‑powered chatbot deployment that leverages cloud‑native best practices and simplifies LLM management.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.