GLM-5 Unveiled: 744B‑Parameter Model Takes on Claude in Complex Tasks

GLM-5, the new 744‑billion‑parameter open‑source LLM, expands on GLM‑4.5 with GlmMoeDsa architecture, achieves higher HLE benchmark scores than Claude Opus 4.5, demonstrates strong long‑context and agent capabilities, supports vLLM/SGLang, runs on various Chinese chips, and can directly generate Office documents.

AI Engineering
AI Engineering
AI Engineering
GLM-5 Unveiled: 744B‑Parameter Model Takes on Claude in Complex Tasks

GLM-5 has been officially released as an open‑source large language model with 744 billion parameters (40 billion active) and a pre‑training corpus of 28.5 trillion tokens, up from GLM‑4.5’s 355 billion parameters (32 billion active) and 23 trillion tokens.

Technically, GLM‑5 adopts the GlmMoeDsa architecture, directly integrating DeepSeek’s DSA sparse‑attention and MTP multi‑token prediction techniques, which lowers deployment cost while preserving long‑context capability. The team also built the slime asynchronous reinforcement‑learning infrastructure, boosting training throughput and enabling finer post‑training iterations.

On the HLE inference benchmark, GLM‑5 scores 30.5 points, surpassing Claude Opus 4.5’s 28.4 points but still below GPT‑5.2’s 35.4 points; with tool augmentation the score rises to 50.4 points. In the SWE‑bench Verified programming test it achieves 77.8%, placing it at the top level among open models. Internal CC‑Bench‑V2 evaluations show GLM‑5 markedly outperforms GLM‑4.7 on front‑end, back‑end, and long‑task categories, narrowing the gap to Claude Opus 4.5.

The Vending Bench 2 scenario, which asks a model to run a virtual vending‑machine business for a year, records a final balance of $4,432 for GLM‑5— the highest among open models and close to Claude Opus 4.5’s $4,967, while DeepSeek‑V3.2 reaches only $1,034.

GLM‑5 introduces a distinctive ability to directly generate usable Office documents (.docx, .pdf, .xlsx) from a textual requirement, a first for open‑source models.

Deployment support includes the vLLM and SGLang frameworks, with weights released on HuggingFace and ModelScope. The model runs on a range of Chinese chips—Huawei Ascend, Moore Threads, Cambricon, Kunlun, Suiyuan, and HaiGuang—thanks to kernel optimizations and model quantization that achieve reasonable inference speed.

Early user feedback is positive: developers report that GLM‑5 outperforms minimax m2.1 on SwiftUI tasks; some users identified GLM‑5 as the previously mysterious "Pony Alpha" model on OpenRouter; voxel‑pagoda scene design tests show a style similar to Opus 4.6 but lacking certain details such as torii gates.

As a flagship domestic open‑source model, GLM’s strong coding capability helps the LLM ecosystem move from dialogue toward agent‑engineered applications. Its free‑token plan and recent Code Plan have attracted many developers, and additional models are expected to be announced after the Chinese New Year.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

vLLMLarge Language ModelClaudeopen-source LLMAI benchmarksChinese chipsGLM-5
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.