DeepSeek V3 & R1: How Their Training Costs Compare to Llama 3.1
The article analyzes DeepSeek’s latest V3 conversational model and R1 inference model, detailing their MoE architecture, training on H800 GPUs costing about $558 k, comparing compute expenses to Meta’s Llama 3.1, and showing that their API pricing is roughly one‑tenth of GPT‑4o for dialogue and one‑twentieth of OpenAI o1 for inference.
DeepSeek has released two high‑profile models: the V3 conversational model, which uses a Mixture‑of‑Experts (MoE) architecture for multi‑task performance, and the R1 inference model, trained with reinforcement learning to excel at code generation and complex mathematical reasoning. Both models were launched in late 2024 and early 2025, respectively, and quickly boosted DeepSeek’s public interest, with the WeChat index reaching around 60 million on December 28 and peaking at 98 million on January 31.
The V3 model was trained on a cluster of H800 GPUs. Using 2048‑token blocks, the training required 3.7 days on a single H800 GPU, which translates to 278.8 k GPU‑hours. At an estimated cost of $2 per GPU‑hour, the total hardware expense for one round of training was approximately $558 k. By contrast, Meta’s Llama 3.1, trained on comparable hardware, incurred roughly $9.24 million, making DeepSeek’s training cost about one‑sixteenth of the competitor’s.
In terms of inference pricing, DeepSeek’s official API rates position the V3 dialogue model at about one‑tenth the price of OpenAI’s GPT‑4o, while the R1 inference model is priced at roughly one‑twentieth of OpenAI’s o1 model. These cost advantages highlight DeepSeek’s focus on delivering high‑performance AI services at a lower price point.
The article also lists numerous related analyses and reports from academic institutions and industry experts, providing further context on DeepSeek’s technology, market impact, and potential applications. However, the core technical insight centers on the models’ architecture, training efficiency, and competitive pricing.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
