GLM-5.2 Tops Code Arena Benchmarks and Goes Open Source
GLM-5.2, the newly released open‑source LLM from Zhipu, achieves the #1 ranking on Code Arena’s global blind‑test, supports a 1 million‑token context, introduces architectural innovations like IndexShare and MTP, and delivers competitive benchmark results against leading closed‑source models.
GLM-5.2 has been launched and open‑sourced by Zhipu. In the Code Arena blind‑test, which involves millions of users, GLM-5.2 secured the global #1 spot, surpassing the still‑locked Claude Fable 5, and also topped the largest crowdsourced design benchmark.
Long‑Task Benchmarks
Supporting a 1 million‑token context, GLM-5.2 excels on three long‑task programming benchmarks. On FrontierSWE, it trails Opus 4.8 by only 1 % while leading GPT‑5.5 by 1 % and Opus 4.7 by 11 %. In PostTrainBench, with a single H100 GPU, GLM-5.2 outperforms Opus 4.7 and GPT‑5.5, ranking second only to Opus 4.8. SWE‑Marathon shows a 13 % gap to Opus 4.8, keeping GLM-5.2 in second place overall. On standard programming benchmarks, GLM-5.2 is the strongest open‑source model, improving from GLM‑5.1: Terminal‑Bench 2.1 scores 81.0 vs 63.5, and SWE‑bench Pro scores 62.1 vs 58.4. Its 81.0 on Terminal‑Bench 2.1 is only a few points below Claude Opus 4.8’s 85.0 and ahead of Gemini 3.1 Pro.
GLM-5.2 introduces an "effort level" control, letting users balance model capability against speed and compute cost.
Architectural Breakthroughs
To enable 1 M context, GLM-5.2 adopts Dynamic Sparse Attention (DSA) with IndexShare. Every four transformer layers share a lightweight indexer placed at the first layer, allowing the top‑k indices to be reused and cutting the per‑token FLOPs by 2.9×. Training from the middle stage used a 128 K sequence length with IndexShare, reducing compute compared to GLM‑5.1.
The Multi‑Token Prediction (MTP) layer, used for speculative decoding, also incorporates IndexShare, lowering draft model cost and raising acceptance length by 20 % in ablation studies.
Inference Engine Optimizations
Three directions were pursued: finer‑grained memory management and parallelism on top of LayerSplit to enlarge KV‑cache capacity; kernel optimizations that better coordinate cache transfer pipelines, minimizing the impact of cache movement on prefill and decode performance; and CPU‑side cache management, request scheduling, and runtime path improvements that reduce GPU pipeline bubbles. These changes give GLM‑5.2 increasing throughput advantages as context length grows.
Open‑Source Release and Getting Started
The full score table covering inference, programming, and agent tasks is publicly available. GLM‑5.2 can be used in ZCode, Claude Code, OpenCode, and a web UI. Subscribers to Coding Plan automatically receive the model; switching the model name to GLM‑5.2 enables the 1 M context, with selectable effort levels (High or Max).
Model weights are released on HuggingFace, ModelScope, and can be deployed locally with transformers, vLLM, SGLang, xLLM, ktransformers, etc. Online inference runs on multiple Chinese hardware platforms (Cambricon, Ascend, etc.), achieving high throughput, low latency, and large concurrency on domestic chip clusters.
GLM‑5.2 is MIT‑licensed, with no regional restrictions and free access.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
