Artificial Intelligence 5 min read

DeepSeek V4 Surge: Technical Specs, Quantization Details, Deployment Costs, and Market Impact

The article compiles key information on DeepSeek V4, covering Ollama's one‑click launch, the model's FP4/FP8 mixed‑precision quantization, size reductions, high local deployment costs, recent benchmark rankings, and the accompanying stock price movements in both China and the US.

Old Zhang's AI Learning

Apr 24, 2026

DeepSeek V4 Surge: Technical Specs, Quantization Details, Deployment Costs, and Market Impact

1. Rapid Launch with Ollama

Ollama offers a one‑click start for deepseek‑v4‑flash and easy integration with Claude, Codex, OpenCode, and OpenClaw.

The author previously judged that Ollama’s shift from consumer‑grade offline inference to an online model provider is a clever move, and its launch command is especially noteworthy.

2. Price Expected to Drop Further

The author notes that DeepSeek likely plans extensive inference on Huawei hardware, suggesting future price reductions.

There is speculation—though not fully reliable—about whether V4 was trained primarily on NVIDIA GPUs.

3. Quantization May Stifle Teams

The released model uses a mixed‑precision scheme: expert parameters in the MoE are stored in FP4 , while most other parameters use FP8 .

This choice limits weight‑compression benefits and makes quantization more difficult.

Compared with open‑source quantized versions of Qwen 3.6‑27B and Qwen 3.6‑35B, which saw many implementations within hours, DeepSeek V4’s quantization appears less accessible.

V4 Flash’s size reductions are modest: Q2 is 105 GB and Q3 is 135 GB, both based on the MLX architecture.

4. Local Deployment Cost Higher Than Expected

Even the Flash version requires hardware such as Atlas 800 A2 (8 × 64 GB), pushing costs above 1.1 million RMB.

The model’s weights total about 300 GB.

5. Stock Price Fluctuations – “Everyone’s Rise Is the Real Rise”

Domestic stocks related to the core concept have surged, and overseas giants like Nvidia and Intel have also risen.

The author invites readers to guess the underlying logic.

6. Philosophical Reflection

The quoted passage comes from Xunzi’s "Fei Shi Er Zi," summarizing principles for a gentleman’s conduct.

Key ideas include:

不诱于誉 : Remain clear‑headed when faced with praise and fame, avoiding being swayed.

不恐于诽 : Maintain composure against criticism and slander.

率道而行 : Follow objective principles rather than seeking superficial approval.

端然正己 : Uphold proper demeanor and rigorous self‑cultivation.

7. DeepSeek’s Capability

The author tested reading comprehension, SVG code generation, and aesthetic tasks, finding the Flash version underperforms, though it still ranks highly on open‑source leaderboards.

On the Vibe Coding benchmark, DeepSeek V4 tops the open‑source list, and on the GDPval‑AA benchmark—focused on agentic ability and workplace productivity—it also achieves the best open‑source score.

8. Official API Speed Test

The author presents a relatively fast result from the official API, illustrated by the accompanying chart.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

quantization large language model FP8 Ollama AI benchmarks DeepSeek V4 FP4 deployment cost

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.