Alibaba Cloud Infrastructure
Mar 18, 2025 · Cloud Native
Gray Release of LoRA and Base Models Using ACK Gateway with AI Extension on Kubernetes
This guide explains how to deploy large language model inference services on a GPU-enabled Kubernetes cluster, configure ACK Gateway with AI Extension for intelligent routing and load balancing, and perform gray releases for both LoRA fine‑tuned models and base models such as QwQ‑32B and DeepSeek‑R1, including step‑by‑step commands and validation procedures.
ACK GatewayAI inferenceCloud Native
0 likes · 25 min read