Tagged articles
7 articles
Page 1 of 1
HyperAI Super Neural
HyperAI Super Neural
Sep 18, 2025 · Artificial Intelligence

DeepSeek‑R1 Costs $294K to Train, Hits Nature Cover as First Peer‑Reviewed Large Model

DeepSeek‑R1, the first mainstream large language model to pass peer review in Nature, was trained for $294,000 using 648 H800 GPUs, and its RL‑enhanced version, DeepSeek‑R1‑Zero, achieved up to 86.7% pass@1 on AIME 2024, outperforming human averages across math, coding, and science tasks.

AI researchDeepSeek-R1Peer Review
0 likes · 10 min read
DeepSeek‑R1 Costs $294K to Train, Hits Nature Cover as First Peer‑Reviewed Large Model
Architects' Tech Alliance
Architects' Tech Alliance
Feb 28, 2025 · Artificial Intelligence

DeepSeek V3 & R1: How Their Training Costs Compare to Llama 3.1

The article analyzes DeepSeek’s latest V3 conversational model and R1 inference model, detailing their MoE architecture, training on H800 GPUs costing about $558 k, comparing compute expenses to Meta’s Llama 3.1, and showing that their API pricing is roughly one‑tenth of GPT‑4o for dialogue and one‑twentieth of OpenAI o1 for inference.

AI model analysisDeepSeekinference pricing
0 likes · 4 min read
DeepSeek V3 & R1: How Their Training Costs Compare to Llama 3.1
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 7, 2025 · Artificial Intelligence

Why DeepSeek V3 Achieves Low Training Costs: Inside Its AI Innovations

This article provides a comprehensive analysis of DeepSeek's large‑language‑model technology, covering the company's background, model capabilities, remarkably low training and inference costs, and the core architectural and algorithmic innovations such as MoE, MLA attention, FP8 mixed‑precision, and the DualPipe pipeline that enable efficient large‑scale AI deployment.

AI ArchitectureDeepSeekFP8 training
0 likes · 19 min read
Why DeepSeek V3 Achieves Low Training Costs: Inside Its AI Innovations
Software Engineering 3.0 Era
Software Engineering 3.0 Era
Feb 1, 2025 · Artificial Intelligence

DeepSeek Deep Dive: How Its Breakthroughs Could Usher in an Era of Universal AI

The article provides a detailed analysis of DeepSeek’s model performance across language, reasoning, and code generation benchmarks, its cost‑effective training methods, novel architecture innovations, the team’s expertise, and the broader impact these factors may have on accelerating AI innovation and reshaping industry competition.

AI benchmarksAI industry impactDeepSeek
0 likes · 18 min read
DeepSeek Deep Dive: How Its Breakthroughs Could Usher in an Era of Universal AI
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 7, 2025 · Artificial Intelligence

How Efficient Is DeepSeek V3? Calculating Its MFU Around 37%

This article derives DeepSeek V3's training Model FLOPs Utilization (MFU) using publicly available data, showing an MFU of roughly 37%—about a 60% improvement over V2—and provides detailed formulas, parameter settings, and a reproducible Python script.

AI performanceDeepSeekMFU
0 likes · 8 min read
How Efficient Is DeepSeek V3? Calculating Its MFU Around 37%