Artificial Intelligence 9 min read

GLM-4.7 Review: How the New Model Beats Competitors in Coding and Reasoning

The GLM-4.7 model launches with record‑breaking benchmark scores across coding, reasoning, and real‑world programming tasks, outperforming both open‑source and commercial LLMs while introducing advanced interleaved, retained, and round‑level thinking modes that enhance complex task execution.

Baobao Algorithm Notes

Dec 24, 2025

GLM-4.7 Review: How the New Model Beats Competitors in Coding and Reasoning

Coding Benchmark Performance

GLM‑4.7 was evaluated on the Code Arena blind‑test platform, which aggregates results from millions of users. It achieved the top rank among open‑source and domestic models, surpassing GPT‑5.2. Key benchmark scores are:

BrowseComp web‑task: 67.5

τ²‑Bench interactive tool: 87.4 (open‑source state‑of‑the‑art, beating Claude Sonnet 4.5)

Human Last Exam (HLE): 42.8 % – a 41 % improvement over GLM‑4.6 and higher than GPT‑5.1

SWE‑bench‑Verified: open‑source first place

LiveCodeBench V6: 84.9 (open‑source SOTA, surpassing Claude Sonnet 4.5)

SWE‑bench Verified: 73.8 % (+5.8 % vs GLM‑4.6)

SWE‑bench Multilingual: 66.7 % (+12.9 % vs GLM‑4.6)

Terminal Bench 2.0: 41 % (+16.5 % vs GLM‑4.6)

Real‑World Programming Evaluation

The model was tested on 100 real‑world programming tasks covering front‑end, back‑end, and instruction‑following capabilities. Three representative cases illustrate its versatility.

Case 1 – Front‑end Aesthetic Generation

Prompt :

请使用 HTML 和 Tailwind CSS（或 CSS 变量）编写一个“安全监控仪表盘”的 Hero Section 代码。要求：
- 深色模式（Dark Mode），主色调深邃蓝，辅助色警示橙。
- 包含玻璃拟态（Glassmorphism）实时数据卡片，背景模糊、细边框高光。
- 标题采用非对称布局，体现科技感与呼吸感。
- 响应式，适配移动端与 2K 分辨率。
- 仅输出单个 HTML 文件，直接运行可见效果。

The model returned a single self‑contained HTML file that renders a dark‑mode hero section with the specified glass‑morphism card, responsive layout, and the required color scheme.

Case 2 – High‑Concurrency Back‑end Service

Prompt :

设计并实现一个支持每秒 10 k 并发写入的分布式奖励记录系统（用于大规模强化学习训练）。
- 使用 Python 类或 Go 函数实现核心写入逻辑。
- 包含 Redis 缓存层和 MySQL 持久化层，同步策略需防止缓存击穿。
- 说明在分布式环境下如何保证 Reward 更新的幂等性。
- 提供错误处理的代码片段。

The generated code defines a thread‑safe writer that acquires a Redis lock, checks for existing entries to ensure idempotency, writes to MySQL within a transaction, and includes retry logic for cache miss and network failures.

Case 3 – End‑to‑End System Specification

Prompt :

为“工业园区智能安防系统”编写完整的技术需求文档（PRD），包括：
1. 四层系统架构（感知层、传输层、分析层、应用层）。
2. 核心逻辑：结合 VLM 模型进行少样本异常行为检测的流程。
3. 数据流转：使用 Markdown 表格列出从采集到报警触发的延迟指标。
4. 异常处理：列举三种硬件失效场景并给出冗余方案。
5. 交付标准：定义三个阶段的验收 KPI。
要求逻辑自洽、术语专业、避免口水话。

The model produced a structured PRD with a clear four‑layer architecture diagram, a step‑by‑step VLM‑based anomaly detection pipeline, a latency table, redundancy strategies for sensor, network, and compute failures, and measurable KPI milestones for functional, performance, and reliability testing.

Advanced Reasoning Modes

GLM‑4.7 introduces three configurable reasoning modes that affect how the model processes multi‑turn or tool‑calling interactions:

Interleaved thinking : The model inserts a deliberate “thinking” step before each response or tool invocation, improving adherence to complex prompts and code quality.

Retained thinking : Thought fragments are automatically cached across dialogue turns, increasing cache‑hit rates and reducing inference cost for long‑running tasks.

Round‑level thinking : Users can enable or disable the thinking step on a per‑conversation‑round basis, allowing low‑latency responses for simple queries while retaining high accuracy for demanding tasks.

Compared with GLM‑4.6, these modes reduce hallucinations, improve prompt compliance, and yield more reliable end‑to‑end code generation, especially in multi‑step engineering workflows.

AI model comparison GLM-4.7 Coding AI LLM benchmark Reasoning Modes

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Coding Benchmark Performance

Real‑World Programming Evaluation

Case 1 – Front‑end Aesthetic Generation

Case 2 – High‑Concurrency Back‑end Service

Case 3 – End‑to‑End System Specification

Advanced Reasoning Modes

Baobao Algorithm Notes

How this landed with the community

Was this worth your time?

0 Comments

Case 1 – Front‑end Aesthetic Generation

Case 2 – High‑Concurrency Back‑end Service

Case 3 – End‑to‑End System Specification