Artificial Intelligence 8 min read

DeepSeek-V3.2 Architecture Breakthrough: A 5‑Minute Guide to Its Core Features

The article introduces DeepSeek-V3.2, highlighting its new DeepSeek Sparse Attention (DSA) that boosts training and inference efficiency by up to 50%, cuts model usage costs dramatically, explains the updated API endpoints, and details the four‑stage post‑training pipeline that underpins the model’s performance improvements.

Fun with Large Models

Sep 30, 2025

DeepSeek-V3.2 Architecture Breakthrough: A 5‑Minute Guide to Its Core Features

DeepSeek released the V3.2 model only a week after the V3.1 Terminus version, unveiling a novel DeepSeek Sparse Attention (DSA) mechanism that enables fine‑grained dynamic sparse attention for long‑context scenarios, delivering a generational leap in training and inference efficiency.

According to the official announcement, the DSA operator improves training and inference speed by 30%‑50%. Although raw performance metrics are comparable to V3.1, the efficiency gain reduces training cost and provides a solid foundation for future large‑scale, long‑text model training.

The most tangible benefit for developers is a steep drop in usage pricing. Input cost per million tokens falls by 50% (cache input 0.2 CNY, regular input 2 CNY) and output cost drops by 75% (3 CNY per million tokens), effectively resetting the price floor for large language models.

API-wise, V3.2 fully replaces V3.1. Developers can invoke the chat mode via deepseek-chat and the reasoning mode via deepseek-reasoner. The legacy V3.1‑Terminus endpoint remains reachable by changing the base_url to "https://api.deepseek.com/v3.1_terminus_expires_on_20251015".

DSA’s sparse attention replaces exhaustive token‑wise weight computation with a lightweight "lightning indexer" that performs a rapid global scan and then focuses precise calculations on the most relevant token blocks. Built on MLA, the indexer and a fine‑grained token selection mechanism together achieve the reported efficiency gains.

The post‑training pipeline consists of four stages: (1) dense warm‑up to train the lightning indexer parameters, (2) sparse training where the full model and DSA are trained together, (3) expert distillation using programming and tool‑calling data, and (4) reinforcement‑learning fine‑tuning with the GRPO algorithm. These stages preserve the original model’s capabilities while substantially lowering inference cost, especially as context length grows.

V3.2 is currently an experimental release, fully open‑sourced on GitHub, with model weights available on ModelScope and HuggingFace. The accompanying DSA paper is also published.

In summary, although DeepSeek‑V3.2 does not show a noticeable accuracy jump over V3.1, its DSA mechanism represents a major efficiency innovation that cuts training expenses and opens the door for larger, more capable models, a development the author expects will soon be adopted across mainstream LLM training pipelines.

Large Language Model AI Architecture Sparse Attention DSA DeepSeek-V3.2 Model Training Efficiency

Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.