Cloud Native 8 min read

Deploying Ollama and Open‑WebUI on Kubernetes with Helm

This guide explains how to deploy the open‑source LLM serving tool Ollama and the Open‑WebUI front‑end on a Kubernetes cluster using Helm charts, covering GPU configuration, persistent storage, service exposure, and model selection for large language models.

Cloud Native Technology Community

Jul 15, 2024

Deploying Ollama and Open‑WebUI on Kubernetes with Helm

Since the rise of large language models (LLMs) and the demand for GPU‑accelerated inference, tools like Ollama and Open‑WebUI have become popular for deploying and managing these models in production. Ollama is an open‑source model serving platform that bundles popular models such as Llama 3, Phi 3, and Mistral, offering a Docker‑like experience focused on machine‑learning workloads.

To simplify deployment, the Open‑WebUI project provides a Helm chart hosted at https://helm.openwebui.com. Adding the repository and updating it is done with:

helm repo add open-webui https://helm.openwebui.com/
helm repo update

The chart can be customized via a myvalues.yaml file. An example configuration disables GPU, enables a persistent NFS volume, and sets the service type to NodePort:

# myvalues.yaml
ollama:
  enabled: true
  ollama:
    gpu:
      enabled: false
  persistentVolume:
    enabled: true
    storageClass: nfs-client
pipelines:
  enabled: true
  persistence:
    enabled: true
    storageClass: "nfs-client"
service:
  type: NodePort
extraEnvVars:
  - name: HF_ENDPOINT
    value: https://hf-mirror.com

Deploy the chart with:

helm upgrade --install ollama open-webui/open-webui -f myvalues.yaml --create-namespace --namespace kube-ai

After installation, check the pods in the kube-ai namespace:

$ kubectl get pods -n kube-ai
NAME                                 READY   STATUS    RESTARTS   AGE
open-webui-0                         1/1     Running   0          2m11s
open-webui-ollama-944dd68fc-wxsjf   1/1     Running   0          24h
open-webui-pipelines-557f6f95cd-dfgh8   1/1     Running   0          25h

The NodePort service exposes Open‑WebUI at http://<NodeIP>:31009. Access the UI, register an account, and configure the Ollama API endpoint (automatically set by the chart).

In the UI, add external connections if Ollama runs elsewhere, then navigate to the Model tab to pull models from https://ollama.com/library (e.g., llama3). After the download completes, select the model to start chatting.

In summary, by containerizing Ollama and Open‑WebUI and orchestrating them with Kubernetes and Helm, you obtain a scalable, maintainable AI‑powered chatbot deployment that leverages cloud‑native best practices and simplifies LLM management.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Kubernetes GPU Ollama helm Open WebUI

Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.