llmaz — 1 Technical Articles

Jun 28, 2025 · Cloud Native

Deploying vLLM with llmaz and Higress: A Step‑by‑Step Cloud‑Native Guide

This tutorial walks through deploying vLLM inference services on a GPU‑enabled Kubernetes cluster using llmaz, configuring Higress as an AI gateway for traffic control, observability, and fallback model switching, and demonstrates end‑to‑end request testing.

Higressfallbackllmaz

0 likes · 15 min read

Deploying vLLM with llmaz and Higress: A Step‑by‑Step Cloud‑Native Guide