GLM-4.7 Hits Global #6 and Leads Open‑Source LLM Rankings, Outperforming Claude 4.5 Sonnet

GLM-4.7 scores 68 points to rank sixth worldwide and first among open‑source models, surpassing Claude 4.5 Sonnet, with strong reasoning performance, fast generation speed, but higher cost and weaker code‑generation and math abilities compared to rivals.

AI Insight Log
AI Insight Log
AI Insight Log
GLM-4.7 Hits Global #6 and Leads Open‑Source LLM Rankings, Outperforming Claude 4.5 Sonnet

How Significant Is This Ranking?

Artificial Analysis, a widely recognized third‑party LLM evaluator, compiles its Intelligence Index from challenging test suites such as MMLU‑Pro, GPQA Diamond, and Humanity's Last Exam, focusing on genuine reasoning and knowledge depth.

Leaderboard Overview

The top five positions remain dominated by closed‑source giants:

Gemini 3 Pro Preview – 73

GPT‑5.2 – 73

Gemini 3 Flash – 71

Claude Opus 4.5 – 70

GPT‑5.1 – 70

Following them is the headline model GLM‑4.7 with a score of 68 , making it the sixth‑ranked model globally and the highest‑scoring open‑source LLM.

1. It’s Global Sixth and Open‑Source First

Behind GLM‑4.7 are other notable models:

DeepSeek V3.2 (67, open‑source)

Kimi K2 Thinking (67, Chinese)

Grok 4 (66, xAI)

Claude 4.5 Sonnet (65, Anthropic)

GLM‑4.7’s 68 points exceed Claude 4.5 Sonnet’s 65, narrowing the gap to the leading Google and OpenAI models to five points.

2. Chinese Models “Battle Like Gods”

DeepSeek V3.2 and Kimi K2 Thinking each score 67, just one point behind GLM‑4.7, showing that China’s top‑tier models can now contend with world‑leading systems such as Claude 4.5 Sonnet and Grok 4.

Objective Analysis: Strengths and Weaknesses

Strengths – Hardcore Reasoning

GPQA Diamond (graduate‑level scientific reasoning): 84% correct, on par with Gemini 3 Pro.

IFBench (instruction following): 70% correct.

These results indicate strong capability in complex logical tasks, scientific questions, and natural‑language understanding.

Weaknesses – Code and Math

LiveCodeBench (code generation): 39%, far below DeepSeek V3.2’s 51% and even lower than Claude 4.5 Sonnet’s 42%.

AIME 2025 (math competition): 37%, modest and behind DeepSeek V3.2’s 60%.

Cost and Speed – Fast but Pricier

GLM‑4.7 generates at 98 tokens/s , labeled “Notably fast” by the evaluator.

Pricing: 2.10 USD per 1 M tokens, considerably higher than DeepSeek V3.2’s typical ~0.3 USD per 1 M tokens.

Verbosity: The model consumes 150 M tokens to complete the Intelligence Index, versus an average of 24 M tokens, indicating heavy use of chain‑of‑thought prompting that raises inference cost.

Implications for Developers and Enterprises

For use cases demanding strong logical reasoning and data‑privacy (open‑source or private deployment), GLM‑4.7 is among the best global options. However, for code‑generation tasks or highly cost‑sensitive workloads, DeepSeek V3.2 may offer better value.

Reference: GLM‑4.7 – Intelligence, Performance & Price Analysis
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Code Generationopen-sourcelarge language modelbenchmarkreasoningcost analysisGLM-4.7
AI Insight Log
Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.