DeepSeek: China’s New LLM Dark Horse – First Impressions and Shockingly Low Prices
The article evaluates DeepSeek v2, a 100‑billion‑parameter MoE model, highlighting its near‑GPT‑4 benchmark performance, OpenAI‑compatible API, 32k‑token context, exceptionally low pricing, a custom token‑utilization metric, and the practical drawbacks observed during hands‑on testing.
Model Overview
DeepSeek v2 is a mixture‑of‑experts (MoE) large language model with 100 billion parameters, released four months after the first open‑source MoE model. The model is open‑source.
Benchmark Position
In a series of benchmark tests the model ranks among the top open‑source and even closed‑source models, approaching the performance of GPT‑4.
Pricing Advantage
Architectural innovations reduce inference cost, resulting in API pricing that is roughly one‑tenth to one‑hundredth of competing services.
API Pricing (yuan per million tokens)
OpenAI gpt‑4‑turbo – input 72.30 ¥, output 216.90 ¥
文心 ERNIE‑4.0‑8K – input 120 ¥, output 120 ¥
通义千问 qwen‑max – input 120 ¥, output 120 ¥
智谱 GLM‑4 – input 100 ¥, output 100 ¥
Kimi moonshot‑v1‑32k – input 24 ¥, output 24 ¥
Kimi moonshot‑v1‑8k – input 12 ¥, output 12 ¥
MiniMax abab6.5 – input 30 ¥, output 30 ¥
MiniMax abab6.5s – input 10 ¥, output 10 ¥
DeepSeek deepseek‑chat (32k) – input 1 ¥, output 2 ¥
Access and API Compatibility
The web chat at chat.deepseek.com runs the latest v2 model but lacks multimodal support and conversation management. The API platform ( platform.deepseek.com) provides a 10‑yuan trial credit, equivalent to about 5 million tokens, sufficient for a month of testing.
Only two core endpoints are offered: model listing and chat completion. The API is deliberately compatible with OpenAI’s, allowing tools that target OpenAI to switch to DeepSeek without code changes.
Context Length
The open‑source version supports a 128 k token context window, while the hosted web chat and current API expose a 32 k token limit.
Token Utilization Metric
The author defines “Token Utilization” as the ratio of model tokens to Chinese characters, using a 1,690‑character essay as test data. Results:
GPT‑4 – 2,267 tokens, utilization 0.75
Kimi – 1,203 tokens, utilization 1.40
Qwen‑max – 1,234 tokens, utilization 1.37
DeepSeek – 1,283 tokens, utilization 1.32
Hands‑On Performance Evaluation
Prompt templates previously used for Kimi—including structured prompts, multi‑turn dialogue, role‑play, and language processing—were applied to DeepSeek’s chat assistant and API. The model handled complex instructions correctly online. In a comparative test covering the same scenarios, DeepSeek received a high overall score, indicating strong capability in complex instruction handling.
Observed Shortcomings
Risk‑control stability : Over‑strict content filtering can block normal user input, returning “Content Exists Risk”.
Inference speed : API latency is slower than GPT‑4, possibly due to risk‑control overhead.
Output style : Responses tend to be verbose; achieving concise output may require more sophisticated prompting such as few‑shot examples.
Product completeness : Features like JSON mode and function calling are absent, limiting developer ergonomics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
