DeepSeek-V2 — 1 Technical Articles

May 6, 2024 · Artificial Intelligence

DeepSeek-V2: 236B MoE LLM Delivers Higher Performance While Cutting Training Cost by 42%

DeepSeek‑V2 is a 236‑billion‑parameter mixture‑of‑experts language model that reduces training cost by 42.5 %, cuts KV‑cache usage by 93.3 %, and boosts generation throughput 5.76×, while achieving state‑of‑the‑art scores on benchmarks such as MMLU, C‑Eval, BBH, HumanEval, and GSM8K for both base and chat variants.

AIDeepSeek-V2Mixture of Experts

0 likes · 11 min read

DeepSeek-V2: 236B MoE LLM Delivers Higher Performance While Cutting Training Cost by 42%