Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)

This guide walks through deploying the DeepSeek‑V4‑Flash model on Ascend NPU using Kthena’s ModelRoute, detailing the Prefill‑Decode (P/D) separation architecture, KV cache transfer via Mooncake, configuration of ModelServing and ModelRoute resources, and flexible scaling of Prefill and Decode replicas for optimal performance.

Ascend NPUDeepSeek V4KV cache

0 likes · 22 min read

Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)

Huawei Cloud Developer Alliance

Apr 2, 2026 · Cloud Native

How Kthena Enables Production‑Grade LLM Inference on Kubernetes

This article analyzes the cloud‑native challenges of deploying large‑model inference on Kubernetes and presents Kthena’s architecture—ModelServing, Router, Autoscaler, and ModelBooster—along with Volcano integration, vLLM‑Ascend setup, and a real‑world Qwen3‑235B deployment case, highlighting performance gains and future directions.

KthenaKubernetesLLM

0 likes · 13 min read

How Kthena Enables Production‑Grade LLM Inference on Kubernetes

Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)

How Kthena Enables Production‑Grade LLM Inference on Kubernetes

Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)