Tagged articles
2 articles
Page 1 of 1
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 29, 2026 · Artificial Intelligence

Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)

This guide walks through deploying the DeepSeek‑V4‑Flash model on Ascend NPU using Kthena’s ModelRoute, detailing the Prefill‑Decode (P/D) separation architecture, KV cache transfer via Mooncake, configuration of ModelServing and ModelRoute resources, and flexible scaling of Prefill and Decode replicas for optimal performance.

Ascend NPUDeepSeek V4KV cache
0 likes · 22 min read
Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 2, 2026 · Cloud Native

How Kthena Enables Production‑Grade LLM Inference on Kubernetes

This article analyzes the cloud‑native challenges of deploying large‑model inference on Kubernetes and presents Kthena’s architecture—ModelServing, Router, Autoscaler, and ModelBooster—along with Volcano integration, vLLM‑Ascend setup, and a real‑world Qwen3‑235B deployment case, highlighting performance gains and future directions.

KthenaKubernetesLLM
0 likes · 13 min read
How Kthena Enables Production‑Grade LLM Inference on Kubernetes