Artificial Intelligence 7 min read

GLM-4.7 Hits Global #6 and Leads Open‑Source LLM Rankings, Outperforming Claude 4.5 Sonnet

GLM-4.7 scores 68 points to rank sixth worldwide and first among open‑source models, surpassing Claude 4.5 Sonnet, with strong reasoning performance, fast generation speed, but higher cost and weaker code‑generation and math abilities compared to rivals.

AI Insight Log

Dec 28, 2025

GLM-4.7 Hits Global #6 and Leads Open‑Source LLM Rankings, Outperforming Claude 4.5 Sonnet

How Significant Is This Ranking?

Artificial Analysis, a widely recognized third‑party LLM evaluator, compiles its Intelligence Index from challenging test suites such as MMLU‑Pro, GPQA Diamond, and Humanity's Last Exam, focusing on genuine reasoning and knowledge depth.

Leaderboard Overview

The top five positions remain dominated by closed‑source giants:

Gemini 3 Pro Preview – 73

GPT‑5.2 – 73

Gemini 3 Flash – 71

Claude Opus 4.5 – 70

GPT‑5.1 – 70

Following them is the headline model GLM‑4.7 with a score of 68 , making it the sixth‑ranked model globally and the highest‑scoring open‑source LLM.

1. It’s Global Sixth and Open‑Source First

Behind GLM‑4.7 are other notable models:

DeepSeek V3.2 (67, open‑source)

Kimi K2 Thinking (67, Chinese)

Grok 4 (66, xAI)

Claude 4.5 Sonnet (65, Anthropic)

GLM‑4.7’s 68 points exceed Claude 4.5 Sonnet’s 65, narrowing the gap to the leading Google and OpenAI models to five points.

2. Chinese Models “Battle Like Gods”

DeepSeek V3.2 and Kimi K2 Thinking each score 67, just one point behind GLM‑4.7, showing that China’s top‑tier models can now contend with world‑leading systems such as Claude 4.5 Sonnet and Grok 4.

Objective Analysis: Strengths and Weaknesses

Strengths – Hardcore Reasoning

GPQA Diamond (graduate‑level scientific reasoning): 84% correct, on par with Gemini 3 Pro.

IFBench (instruction following): 70% correct.

These results indicate strong capability in complex logical tasks, scientific questions, and natural‑language understanding.

Weaknesses – Code and Math

LiveCodeBench (code generation): 39%, far below DeepSeek V3.2’s 51% and even lower than Claude 4.5 Sonnet’s 42%.

AIME 2025 (math competition): 37%, modest and behind DeepSeek V3.2’s 60%.

Cost and Speed – Fast but Pricier

GLM‑4.7 generates at 98 tokens/s , labeled “Notably fast” by the evaluator.

Pricing: 2.10 USD per 1 M tokens, considerably higher than DeepSeek V3.2’s typical ~0.3 USD per 1 M tokens.

Verbosity: The model consumes 150 M tokens to complete the Intelligence Index, versus an average of 24 M tokens, indicating heavy use of chain‑of‑thought prompting that raises inference cost.

Implications for Developers and Enterprises

For use cases demanding strong logical reasoning and data‑privacy (open‑source or private deployment), GLM‑4.7 is among the best global options. However, for code‑generation tasks or highly cost‑sensitive workloads, DeepSeek V3.2 may offer better value.

Reference: GLM‑4.7 – Intelligence, Performance & Price Analysis

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Code Generation open-source large language model benchmark reasoning cost analysis GLM-4.7

Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.