GLM-4.7 Hits Global #6 and Leads Open‑Source LLM Rankings, Outperforming Claude 4.5 Sonnet
GLM-4.7 scores 68 points to rank sixth worldwide and first among open‑source models, surpassing Claude 4.5 Sonnet, with strong reasoning performance, fast generation speed, but higher cost and weaker code‑generation and math abilities compared to rivals.
How Significant Is This Ranking?
Artificial Analysis, a widely recognized third‑party LLM evaluator, compiles its Intelligence Index from challenging test suites such as MMLU‑Pro, GPQA Diamond, and Humanity's Last Exam, focusing on genuine reasoning and knowledge depth.
Leaderboard Overview
The top five positions remain dominated by closed‑source giants:
Gemini 3 Pro Preview – 73
GPT‑5.2 – 73
Gemini 3 Flash – 71
Claude Opus 4.5 – 70
GPT‑5.1 – 70
Following them is the headline model GLM‑4.7 with a score of 68 , making it the sixth‑ranked model globally and the highest‑scoring open‑source LLM.
1. It’s Global Sixth and Open‑Source First
Behind GLM‑4.7 are other notable models:
DeepSeek V3.2 (67, open‑source)
Kimi K2 Thinking (67, Chinese)
Grok 4 (66, xAI)
Claude 4.5 Sonnet (65, Anthropic)
GLM‑4.7’s 68 points exceed Claude 4.5 Sonnet’s 65, narrowing the gap to the leading Google and OpenAI models to five points.
2. Chinese Models “Battle Like Gods”
DeepSeek V3.2 and Kimi K2 Thinking each score 67, just one point behind GLM‑4.7, showing that China’s top‑tier models can now contend with world‑leading systems such as Claude 4.5 Sonnet and Grok 4.
Objective Analysis: Strengths and Weaknesses
Strengths – Hardcore Reasoning
GPQA Diamond (graduate‑level scientific reasoning): 84% correct, on par with Gemini 3 Pro.
IFBench (instruction following): 70% correct.
These results indicate strong capability in complex logical tasks, scientific questions, and natural‑language understanding.
Weaknesses – Code and Math
LiveCodeBench (code generation): 39%, far below DeepSeek V3.2’s 51% and even lower than Claude 4.5 Sonnet’s 42%.
AIME 2025 (math competition): 37%, modest and behind DeepSeek V3.2’s 60%.
Cost and Speed – Fast but Pricier
GLM‑4.7 generates at 98 tokens/s , labeled “Notably fast” by the evaluator.
Pricing: 2.10 USD per 1 M tokens, considerably higher than DeepSeek V3.2’s typical ~0.3 USD per 1 M tokens.
Verbosity: The model consumes 150 M tokens to complete the Intelligence Index, versus an average of 24 M tokens, indicating heavy use of chain‑of‑thought prompting that raises inference cost.
Implications for Developers and Enterprises
For use cases demanding strong logical reasoning and data‑privacy (open‑source or private deployment), GLM‑4.7 is among the best global options. However, for code‑generation tasks or highly cost‑sensitive workloads, DeepSeek V3.2 may offer better value.
Reference: GLM‑4.7 – Intelligence, Performance & Price Analysis
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
