How DeepSeek Trained a $30M LLM for Just $29.4K – Inside the R1 Model
The article reports that DeepSeek’s R1 large language model, detailed in a peer‑reviewed Nature paper, was built with roughly $300 k in total cost—about $29.4 k for training—using Nvidia H800 chips and novel pure reinforcement‑learning techniques, achieving competitive performance while remaining open‑source.
Overview
DeepSeek’s R1 is a large language model (LLM) designed for reasoning tasks. It is the first mainstream LLM to be evaluated through a fully peer‑reviewed paper published in Nature (DOI: 10.1038/d41586-025-03015-6).
Cost and Funding
The total development cost was approximately $300,000 , of which the training phase accounted for about $294,000 (≈ 29.4 × 10⁴ USD). This is an order of magnitude lower than the tens of millions of dollars typically reported for comparable models.
Hardware and Training Infrastructure
Training was performed on Nvidia H800 GPUs. Despite export restrictions that limited the availability of H800 chips in China after 2023, DeepSeek completed training using the high‑throughput capabilities of these GPUs.
Training Methodology
R1 uses a pure reinforcement learning (RL) pipeline that does not rely on copying or imitating examples generated by other LLMs. The model learns through an automated trial‑and‑error process guided by a reward model, allowing it to develop its own inference strategies without human‑crafted reasoning templates.
Two key algorithmic innovations were introduced:
Group‑relative policy optimization : a self‑estimated scoring mechanism that evaluates the model’s outputs without an external algorithm.
Self‑evaluation : the model generates its own quality estimates for candidate responses, reducing dependence on external data.
Performance Evaluation
Independent benchmarks such as the ScienceAgentBench suite show that R1 achieves a strong balance of capability and cost. While it may not lead in raw accuracy on every metric, it outperforms many proprietary models in efficiency and overall utility.
Open‑Source Release
The model weights and associated code have been released publicly on Hugging Face, enabling community experimentation. The release includes a detailed cost breakdown, providing a case study for low‑budget LLM development.
Key Technical Details
Model size and architecture: standard transformer backbone optimized for reasoning tasks (exact parameters not disclosed).
Training data: the base model was pretrained on publicly available internet text; specific datasets for the RL stage were not disclosed.
Training budget: $30 × 10⁴ USD for the RL fine‑tuning stage, on top of an estimated $60 × 10⁴ USD pre‑training cost for the base model.
Hardware configuration: multiple Nvidia H800 GPUs (exact count not disclosed).
Reproducibility and Community Impact
The peer‑reviewed publication and open release of weights aim to improve transparency and reproducibility in LLM research. Researchers can replicate the training pipeline, evaluate safety and effectiveness, and explore extensions of the pure‑RL approach.
Code example
作者:Elizabeth Gibney
翻译:
赵茹萱
本文
约2500字
,建议阅读
8
分钟
首次同行评审的研究展示了这家中国初创公司如何花费
30
万美元制造出震撼市场的大型语言模型。Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
