Artificial Intelligence 21 min read

Why DeepSeek Is Shaking Up the LLM Landscape: Architecture, Performance, and Cost

DeepSeek, a Chinese AI startup, offers open‑source large language models—DeepSeek‑V3 for general tasks and DeepSeek‑R1 for intensive reasoning—featuring MoE, MLA, low‑cost training, and competitive performance against OpenAI’s GPT‑4o, while providing detailed usage guidance and cost analysis.

Alibaba Cloud Developer

Mar 26, 2025

Why DeepSeek Is Shaking Up the LLM Landscape: Architecture, Performance, and Cost

Large Model Development Overview

From OpenAI’s GPT‑4o (2024) that handles text, images, and audio to the later GPT‑4o mini, o1‑preview, o1‑mini, o1, and GPT‑4.5, the trend shows increasingly multimodal capabilities, lower inference costs, and stronger reasoning models.

2024 May – OpenAI released GPT‑4o , achieving state‑of‑the‑art results on speech, multilingual, and vision benchmarks. July 2024 – GPT‑4o mini replaced GPT‑3.5 Turbo with significantly lower API costs. September 2024 – o1‑preview (reasoning‑focused) and o1‑mini (coding‑focused) were introduced, emphasizing longer “thinking” time before answering. December 2024 – o1 added more multimodal features. February 2025 – o3‑mini delivered faster, more accurate answers, and the deep‑research agent enabled web browsing and data analysis. By February 2025, OpenAI announced internal progress to GPT‑4.5, focusing on high‑capacity reasoning, multimodality, and agents.

Fundamentals of Large Models

Training a large language model follows eight core steps: data collection and cleaning, tokenization, positional encoding, pre‑training (next‑token prediction), supervised fine‑tuning (SFT), reinforcement learning from human feedback (RLHF), greedy token generation, and inference optimization. The three essential stages are pre‑training, supervised fine‑tuning, and RLHF.

General vs. Reasoning Models

Large models can be divided into general models —optimized for language generation, context understanding, and NLP tasks—and reasoning models —enhanced for logical analysis, decision‑making, and complex problem solving. Reasoning models often employ chain‑of‑thought (CoT) prompting to produce intermediate reasoning steps, improving performance on arithmetic, commonsense, and scientific queries.

DeepSeek Overview

DeepSeek (深度求索) is a Chinese AI company founded in July 2023, focusing on AGI research and large‑model development. Its flagship models are:

DeepSeek‑V3 : an open‑source general‑purpose LLM positioned against GPT‑4o.

DeepSeek‑R1 : an open‑source reasoning model targeting complex tasks and comparable to OpenAI’s o1/o1‑mini.

Both models support intelligent dialogue, text generation, semantic understanding, code generation, web‑search, and document ingestion.

Technical Roadmap

DeepSeek’s efficiency stems from several innovations:

DeepSeekMoE : a mixture‑of‑experts architecture that activates only a subset of experts (≈37 B parameters) during inference, reducing compute compared to a full 671 B‑parameter model.

Natural load‑balancing without auxiliary losses to distribute expert workload evenly.

MLA (Multi‑Head Latent Attention) : introduces latent vectors to compress key‑value caches, cutting KV memory to 5‑13 % of traditional MHA and lowering compute.

Multi‑Token Prediction (MTP) : predicts multiple tokens simultaneously in suitable scenarios, improving token density and reducing context drift.

GRPO (Group Relative Policy Optimization) : a reinforcement‑learning algorithm that samples multiple outputs per query and optimizes the policy based on relative rewards, avoiding a large critic network.

FP8 mixed‑precision training : combines FP8 with selective FP16/FP32 to save memory while preserving critical precision.

DualPipe communication : overlaps computation and data transfer across nodes to eliminate pipeline bubbles.

Training follows a multi‑stage pipeline: cold‑start with high‑quality CoT data, large‑scale RL to develop reasoning, generation of SFT data, further fine‑tuning, and final RL with a comprehensive dataset. The resulting model can be distilled into smaller open‑source variants.

Performance and Cost

DeepSeek‑V3 achieves inference speeds of 60 TPS (up from 20 TPS in V2) and ranks at the top of open‑source LLM leaderboards, rivaling state‑of‑the‑art closed models. DeepSeek‑R1 excels in reasoning‑intensive benchmarks, matching OpenAI’s o1 on AIME 2024 (79.8 % vs 79.2 %) and MATH‑500 (97.3 % vs 96.4 %).

Training cost for DeepSeek‑V3 is roughly $5.6 M (≈278.8 k GPU‑hours on H800 at $2/GPU‑hour), about 1/20 of OpenAI’s o1 training budget (estimated hundreds of millions). API calls for DeepSeek‑V3 cost about 1/4 of OpenAI’s o3‑mini and 1/10 of GPT‑4o, while still delivering stronger performance than GPT‑4o‑mini.

Usage Guidance

For reasoning tasks, DeepSeek‑R1 benefits from concise prompts that state the goal, letting the model generate its own CoT. For general tasks, DeepSeek‑V3 requires explicit step‑by‑step prompting and may need additional context or examples. Users are advised to prioritize natural language interaction, watch for hallucinations on factual data, and leverage the model’s deep‑thinking capabilities for risk assessment, reverse‑engineering logic, and cross‑domain solution transfer.

Impact

DeepSeek’s open‑source strategy has accelerated global LLM competition, prompting other vendors to lower prices, release new models, and improve accessibility. Its technical contributions—especially in MoE, MLA, and efficient RL pipelines—offer valuable research insights for the AI community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models DeepSeek AI inference model architecture cost efficiency

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.