Artificial Intelligence 9 min read

Why DeepSeek‑V3.2‑Exp Lost Performance and How a Simple RoPE Fix Restored It

The Baidu Baige team discovered that DeepSeek‑V3.2‑Exp’s long‑context performance lagged behind the official report, traced the issue to a subtle RoPE layout mismatch in the open‑source inference demo, collaborated with DeepSeek to fix it, and verified that the model’s speed and accuracy fully recovered across multiple benchmarks.

Baidu Intelligent Cloud Tech Hub

Nov 25, 2025

Why DeepSeek‑V3.2‑Exp Lost Performance and How a Simple RoPE Fix Restored It

Background

DeepSeek released the experimental model DeepSeek‑V3.2‑Exp on 2025‑09‑29. The model adds DeepSeek Sparse Attention (DSA) to the V3.1‑Terminus backbone. DSA consists of two tightly coupled components:

Lightning Indexer : a lightweight scoring module that estimates the relevance of each token to its historical context.

Sparse MLA (Multi‑head Latent Attention) : built on the DeepSeek‑V series backbone, it accesses only the top‑k latent entries selected by the Indexer, performing sparse key‑value computation.

By processing only k entries (where k≪L), the attention complexity is reduced from O(L²) to O(L·k). In 128K context windows the end‑to‑end inference cost drops by roughly 50% while performance on more than 15 public benchmarks (e.g., MMLU‑Pro, SWE Verified, BrowseComp) remains comparable to V3.1‑Terminus.

Performance anomaly observed in production

When deploying DeepSeek‑V3.2‑Exp on Baidu’s Qianfan large‑model platform on 2025‑10‑08, the following degradations were recorded:

SWE Verified (Mini‑SWE‑Agent) score: 47.8 vs. 59.2 for V3.1‑Terminus.

RULER niah_multikey_3 task pass rate at 128K length: 12.5 % vs. 81.25 % for V3.1‑Terminus.

Multiple inference back‑ends (SGLang, vLLM) and parameter adjustments (temperature, iteration steps) did not alleviate the issue, indicating that the root cause lay within the model implementation rather than external configuration.

Root‑cause diagnosis: RoPE layout mismatch

DeepSeek’s Inference Demo (commit 8631a81, 2025‑11‑17) mixed two rotary‑position‑embedding (RoPE) layouts:

The MLA module expects an interleaved RoPE arrangement.

The Lightning Indexer originally used a non‑interleaved (block) layout, similar to GPT‑NeoX and Qwen.

The demo failed to distinguish these modes, causing the Indexer to apply the interleaved layout. This introduced systematic token‑position offsets, degrading the accuracy of the top‑k selection and consequently the overall model performance.

Official fix and community collaboration

DeepSeek introduced a lightweight fix:

Added a boolean interleaved parameter to the apply_rotary_emb function, allowing explicit selection of the RoPE layout.

Modified the Indexer call to pass interleaved=False, preserving its non‑interleaved (block) behavior.

Retained the default interleaved=True for MLA, keeping compatibility with the backbone.

The change touches only a dozen lines of code and does not alter model weights. The corrected inference code is available at:

https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference

Baidu Baige integrated the same adaptation into its proprietary inference engine and contributed the patch to the SGLang project (PR #13495):

https://github.com/sgl-project/sgl-project/pull/13495

Performance fully restored

Post‑fix evaluations show restored parity and, in some cases, improvements:

SWE Verified score improved to 59.6 , matching V3.1‑Terminus (59.2).

RULER niah_multikey_3 pass rate recovered to 100 % across the full 4K‑128K length range, with 32K‑128K segments surpassing V3.1‑Terminus.

Implications for AI infrastructure

The incident highlights the importance of rigorous open‑source validation and close collaboration between model developers and infrastructure teams. The RoPE layout fix, validated in production, has been fed back to the community, improving the stability and efficiency of generative AI deployments.

Performance optimization RoPE open-source DeepSeek LLM inference AI infrastructure sparse attention

Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.