Tencent Technical Engineering
Jul 11, 2025 · Artificial Intelligence
How DeepSeek Achieved 15,800+ Tokens/s: Full‑Stack Inference Optimizations
This article details the Angel‑HCF team's end‑to‑end DeepSeek inference optimizations—including PD separation, multi‑layer MTP, EP and DP parallelism, hardware‑aware kernels, and load‑balancing strategies—that boost throughput to over 15,800 tokens per second while keeping per‑token latency under 50 ms.
AI performanceDeepSeekGPU utilization
0 likes · 13 min read
