Tagged articles

Mooncake

3 articles · Page 1 of 1
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 29, 2026 · Artificial Intelligence

Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)

This guide walks through deploying the DeepSeek‑V4‑Flash model on Ascend NPU using Kthena’s ModelRoute, detailing the Prefill‑Decode (P/D) separation architecture, KV cache transfer via Mooncake, configuration of ModelServing and ModelRoute resources, and flexible scaling of Prefill and Decode replicas for optimal performance.

Ascend NPUDeepSeek-V4KV cache
0 likes · 22 min read
Deploy DeepSeek‑V4 on Ascend NPU with Kthena in 3 Minutes (Prefill‑Decode Separation)
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 24, 2025 · Artificial Intelligence

Boosting LLM Inference: RoleBasedGroup & Mooncake for Stable, High‑Performance Service

Large language model inference faces memory pressure, but by externalizing KVCache with Mooncake and orchestrating roles via the Kubernetes‑native RoleBasedGroup (RBG), developers can achieve stable, high‑throughput, cost‑effective serving with seamless in‑place upgrades and topology‑aware performance.

AI InfrastructureKVCacheKubernetes
0 likes · 21 min read
Boosting LLM Inference: RoleBasedGroup & Mooncake for Stable, High‑Performance Service