GLM-4.7 Beats GPT-5 in Coding Tests at One‑Seventh the Cost

Zhipu's newly released GLM-4.7 model outperforms GPT-5 and Claude Sonnet 4.5 on multiple coding benchmarks, introduces Vibe Coding for UI generation, offers Interleaved and Preserved Thinking capabilities, is fully open‑source, and costs only one‑seventh of competing services.

AI Insight Log
AI Insight Log
AI Insight Log
GLM-4.7 Beats GPT-5 in Coding Tests at One‑Seventh the Cost

At the end of 2025, Zhipu released GLM-4.7, a 358‑billion‑parameter open‑source model marketed as "your new coding partner".

Core Strength: Coding Performance

On the SWE‑bench software‑engineering benchmark, GLM-4.7 achieved a score of 73.8% , a gain of nearly 6 percentage points over GLM-4.6 and up to 12.9% on multilingual programming tasks. On Terminal Bench 2.0, its performance rose by 16.5% , indicating more accurate command‑line operations.

In practical terms, the model can handle complex software‑engineering problems beyond simple code generation.

Vibe Coding: Better UI/UX Output

GLM-4.7 adds a feature called Vibe Coding that optimises generated UI code for modern, clean designs and precise layout sizing. The model also produces well‑formatted PPT slides. Example outputs include dark‑mode websites and artistic particle‑effect pages, which the author describes as "god‑level" for front‑end developers.

Vibe Coding example
Vibe Coding example

Thinking Enhancements

The model introduces Interleaved Thinking , which performs a deep logical reasoning step before each operation or tool call, avoiding blind trial‑and‑error. It also supports Preserved Thinking , remembering previous reasoning across multi‑turn dialogues to maintain stability on long tasks.

Thinking mechanisms
Thinking mechanisms

Benchmark Comparison with Top Models

AIME 25 (Math Competition) : GLM-4.7 scores 95.7 , surpassing GPT‑5 (94.6) and Claude Sonnet 4.5 (87.0).

GPQA‑Diamond (Expert Q&A) : GLM-4.7 scores 85.7 , higher than Claude Sonnet 4.5 (83.4).

HLE (Human Last Exam) : GLM-4.7 achieves 42.8% , far above Claude Sonnet 4.5's 32.0%.

Across most core leaderboards, GLM-4.7 ranks in the first tier of current AI models and leads in several domains.

Leaderboard
Leaderboard

Open‑Source, Affordable, Ready to Use

GLM-4.7's weights (358B parameters) are published on HuggingFace, allowing local deployment and full data control.

API pricing is roughly 1/7 of Claude Sonnet 4.5, offering a better cost‑performance ratio.

Open‑source : Model weights available for download.

Cheap : API cost is one‑seventh of the main competitor.

Compatibility : Supports major agent frameworks such as Claude Code and Roo Code; existing GLM Coding Plan subscribers can upgrade immediately.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

code-generationopen-source AIinterleaved thinkingGLM-4.7AI model benchmarkprice efficiency
AI Insight Log
Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.