GLM-5 Unveiled: 744B Parameters, Claude Opus 4.5‑Level Performance, Epic Agent Upgrade
Z.ai released the open‑source GLM‑5 model with 744 billion parameters, 28.5 T tokens of training data, and new Sparse Attention and Slime RL infrastructure, achieving top open‑source rankings and near‑Claude Opus 4.5 performance on Vending Bench 2 and CC‑Bench‑V2 while adding multi‑scenario agent capabilities.
Parameter Surge: 744 B Model
Compared with GLM‑4.5, GLM‑5 upgrades:
Parameter count : 355 B (32 B activation) → 744 B (40 B activation).
Training data : pre‑training tokens increase from 23 T to 28.5 T.
Architectural innovation : integrates DeepSeek Sparse Attention (DSA), reducing deployment cost while preserving long‑context capability.
Training infrastructure : new asynchronous reinforcement‑learning platform Slime improves training throughput.
Benchmark Performance vs. Claude Opus 4.5
Vending Bench 2 – Long‑term planning
The benchmark simulates operating a vending‑machine company for one year. GLM‑5 ranks first among open‑source models with a final account balance of $4,432 , compared to Claude Opus 4.5’s $4,967 .
CC‑Bench‑V2 – Full‑stack development
In Z.ai’s internal CC‑Bench‑V2 suite, GLM‑5 surpasses GLM‑4.7 on front‑end, back‑end, and extended system tasks. Stability on long‑horizon tasks narrows the gap with Claude Opus 4.5.
Beyond Chat: Work‑Oriented Capabilities
Document generation : converts input into correctly formatted .docx, .pdf, and .xlsx files.
Scenario coverage : supports product requirement documents, teaching plans, financial reports, and operation checklists.
The new Agent mode combines these abilities for multi‑turn collaboration that produces concrete deliverables.
Getting Started
Online demo : access the model at chat.z.ai and select GLM‑5.
Developer integration : GLM‑5 is included in the GLM Coding Plan and can be integrated into coding agents such as Claude Code, OpenCode, and Cline.
Local deployment : model weights are open‑sourced on Hugging Face and ModelScope; compatible with vLLM and SGLang inference frameworks and adapted for Huawei Ascend and Moore Threads chips.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
