Artificial Intelligence 9 min read

DeepSeek: China’s New LLM Dark Horse – First Impressions and Shockingly Low Prices

The article evaluates DeepSeek v2, a 100‑billion‑parameter MoE model, highlighting its near‑GPT‑4 benchmark performance, OpenAI‑compatible API, 32k‑token context, exceptionally low pricing, a custom token‑utilization metric, and the practical drawbacks observed during hands‑on testing.

CSS Magic

May 13, 2024

DeepSeek: China’s New LLM Dark Horse – First Impressions and Shockingly Low Prices

Model Overview

DeepSeek v2 is a mixture‑of‑experts (MoE) large language model with 100 billion parameters, released four months after the first open‑source MoE model. The model is open‑source.

Benchmark Position

In a series of benchmark tests the model ranks among the top open‑source and even closed‑source models, approaching the performance of GPT‑4.

Pricing Advantage

Architectural innovations reduce inference cost, resulting in API pricing that is roughly one‑tenth to one‑hundredth of competing services.

API Pricing (yuan per million tokens)

OpenAI gpt‑4‑turbo – input 72.30 ¥, output 216.90 ¥

文心 ERNIE‑4.0‑8K – input 120 ¥, output 120 ¥

通义千问 qwen‑max – input 120 ¥, output 120 ¥

智谱 GLM‑4 – input 100 ¥, output 100 ¥

Kimi moonshot‑v1‑32k – input 24 ¥, output 24 ¥

Kimi moonshot‑v1‑8k – input 12 ¥, output 12 ¥

MiniMax abab6.5 – input 30 ¥, output 30 ¥

MiniMax abab6.5s – input 10 ¥, output 10 ¥

DeepSeek deepseek‑chat (32k) – input 1 ¥, output 2 ¥

Access and API Compatibility

The web chat at chat.deepseek.com runs the latest v2 model but lacks multimodal support and conversation management. The API platform ( platform.deepseek.com) provides a 10‑yuan trial credit, equivalent to about 5 million tokens, sufficient for a month of testing.

Only two core endpoints are offered: model listing and chat completion. The API is deliberately compatible with OpenAI’s, allowing tools that target OpenAI to switch to DeepSeek without code changes.

Context Length

The open‑source version supports a 128 k token context window, while the hosted web chat and current API expose a 32 k token limit.

Token Utilization Metric

The author defines “Token Utilization” as the ratio of model tokens to Chinese characters, using a 1,690‑character essay as test data. Results:

GPT‑4 – 2,267 tokens, utilization 0.75

Kimi – 1,203 tokens, utilization 1.40

Qwen‑max – 1,234 tokens, utilization 1.37

DeepSeek – 1,283 tokens, utilization 1.32

Hands‑On Performance Evaluation

Prompt templates previously used for Kimi—including structured prompts, multi‑turn dialogue, role‑play, and language processing—were applied to DeepSeek’s chat assistant and API. The model handled complex instructions correctly online. In a comparative test covering the same scenarios, DeepSeek received a high overall score, indicating strong capability in complex instruction handling.

Observed Shortcomings

Risk‑control stability : Over‑strict content filtering can block normal user input, returning “Content Exists Risk”.

Inference speed : API latency is slower than GPT‑4, possibly due to risk‑control overhead.

Output style : Responses tend to be verbose; achieving concise output may require more sophisticated prompting such as few‑shot examples.

Product completeness : Features like JSON mode and function calling are absent, limiting developer ergonomics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

DeepSeek large language model benchmark MoE pricing API compatibility token utilization

Written by

CSS Magic

Learn and create, pioneering the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.