Tencent Technical Engineering
Tencent Technical Engineering
Jul 11, 2025 · Artificial Intelligence

How DeepSeek Achieved 15,800+ Tokens/s: Full‑Stack Inference Optimizations

This article details the Angel‑HCF team's end‑to‑end DeepSeek inference optimizations—including PD separation, multi‑layer MTP, EP and DP parallelism, hardware‑aware kernels, and load‑balancing strategies—that boost throughput to over 15,800 tokens per second while keeping per‑token latency under 50 ms.

AI performanceDeepSeekGPU utilization
0 likes · 13 min read
How DeepSeek Achieved 15,800+ Tokens/s: Full‑Stack Inference Optimizations
MaGe Linux Operations
MaGe Linux Operations
Jan 28, 2024 · Cloud Native

Mastering Kind: Build, Configure, and Scale Kubernetes Clusters with Docker

This guide explains how to use Kind—a Docker‑based tool for creating Kubernetes test clusters—including its architecture, network model, installation steps, cluster creation commands, multi‑node and multi‑control‑plane configurations, image loading, custom port mappings, ingress deployment, troubleshooting tips, and best‑practice recommendations.

ClusterConfigurationDocker
0 likes · 15 min read
Mastering Kind: Build, Configure, and Scale Kubernetes Clusters with Docker