Baobao Algorithm Notes
May 6, 2024 · Artificial Intelligence
DeepSeek-V2: 236B MoE LLM Delivers Higher Performance While Cutting Training Cost by 42%
DeepSeek‑V2 is a 236‑billion‑parameter mixture‑of‑experts language model that reduces training cost by 42.5 %, cuts KV‑cache usage by 93.3 %, and boosts generation throughput 5.76×, while achieving state‑of‑the‑art scores on benchmarks such as MMLU, C‑Eval, BBH, HumanEval, and GSM8K for both base and chat variants.
AIDeepSeek-V2Mixture of Experts
0 likes · 11 min read
