Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)
This guide walks through deploying the DeepSeek‑V4‑Flash model on Ascend NPU using Kthena’s ModelRoute, detailing the Prefill‑Decode (P/D) separation architecture, KV cache transfer via Mooncake, configuration of ModelServing and ModelRoute resources, and flexible scaling of Prefill and Decode replicas for optimal performance.
