DeepSeek V4 Surge: Technical Specs, Quantization Details, Deployment Costs, and Market Impact
The article compiles key information on DeepSeek V4, covering Ollama's one‑click launch, the model's FP4/FP8 mixed‑precision quantization, size reductions, high local deployment costs, recent benchmark rankings, and the accompanying stock price movements in both China and the US.
1. Rapid Launch with Ollama
Ollama offers a one‑click start for deepseek‑v4‑flash and easy integration with Claude, Codex, OpenCode, and OpenClaw.
The author previously judged that Ollama’s shift from consumer‑grade offline inference to an online model provider is a clever move, and its launch command is especially noteworthy.
2. Price Expected to Drop Further
The author notes that DeepSeek likely plans extensive inference on Huawei hardware, suggesting future price reductions.
There is speculation—though not fully reliable—about whether V4 was trained primarily on NVIDIA GPUs.
3. Quantization May Stifle Teams
The released model uses a mixed‑precision scheme: expert parameters in the MoE are stored in FP4 , while most other parameters use FP8 .
This choice limits weight‑compression benefits and makes quantization more difficult.
Compared with open‑source quantized versions of Qwen 3.6‑27B and Qwen 3.6‑35B, which saw many implementations within hours, DeepSeek V4’s quantization appears less accessible.
V4 Flash’s size reductions are modest: Q2 is 105 GB and Q3 is 135 GB, both based on the MLX architecture.
4. Local Deployment Cost Higher Than Expected
Even the Flash version requires hardware such as Atlas 800 A2 (8 × 64 GB), pushing costs above 1.1 million RMB.
The model’s weights total about 300 GB.
5. Stock Price Fluctuations – “Everyone’s Rise Is the Real Rise”
Domestic stocks related to the core concept have surged, and overseas giants like Nvidia and Intel have also risen.
The author invites readers to guess the underlying logic.
6. Philosophical Reflection
The quoted passage comes from Xunzi’s "Fei Shi Er Zi," summarizing principles for a gentleman’s conduct.
Key ideas include:
不诱于誉 : Remain clear‑headed when faced with praise and fame, avoiding being swayed.
不恐于诽 : Maintain composure against criticism and slander.
率道而行 : Follow objective principles rather than seeking superficial approval.
端然正己 : Uphold proper demeanor and rigorous self‑cultivation.
7. DeepSeek’s Capability
The author tested reading comprehension, SVG code generation, and aesthetic tasks, finding the Flash version underperforms, though it still ranks highly on open‑source leaderboards.
On the Vibe Coding benchmark, DeepSeek V4 tops the open‑source list, and on the GDPval‑AA benchmark—focused on agentic ability and workplace productivity—it also achieves the best open‑source score.
8. Official API Speed Test
The author presents a relatively fast result from the official API, illustrated by the accompanying chart.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
