Industry Insights 6 min read

Grok 4 Unveiled: Why xAI Claims Its New Model Beats the Competition

On July 10, xAI launched Grok 4, a multimodal LLM with a 256K‑token context window, tool‑use upgrades and benchmark scores that surpass existing models, while pricing it at $30/month for the standard tier and $300/month for the heavy tier.

Baobao Algorithm Notes

Jul 10, 2025

Grok 4 Unveiled: Why xAI Claims Its New Model Beats the Competition

Key Highlights of Grok 4

All‑domain reasoning boost – many tests outperform current leading models.

256K‑token context window – dramatically larger than typical LLMs.

Multimodal interaction – supports voice, images, code and upcoming video capabilities.

Reinforcement‑learning + tool use – HLE score rises to 50.7%.

Future versions – programming model, multimodal agents, video generation.

Elon Musk’s Assessment

“Grok 4’s reasoning ability already exceeds humans; it can score full marks on the SAT and near‑perfect scores on GRE subjects.”

Musk added that the model could soon enable genuine scientific discoveries.

Benchmark Performance (SOTA Results)

HLE (Math, Chemistry, Logic) – highest 50.7% with tool use, beating the previous SOTA of 41.0%.

ARC‑AGI‑2 (High‑order reasoning) – 15.9%, roughly double the performance of Claude Opus and other commercial models.

AIME 25 (USAMO Invitational) – perfect 100, unmatched by peer models.

USAMO 25 (US Math Olympiad) – SOTA on top‑level high‑school problems.

LCB Programming Challenge – leading performance; upcoming Grok Code will further improve.

ARC‑AGI‑2 is a notoriously difficult benchmark from the ARC Prize Foundation, comparable to civil‑service aptitude tests, where even humans struggle.

Reasoning Mechanism Evolution

Grok 2 – traditional token prediction.

Grok 3 – introduced RL fine‑tuning for deeper reasoning.

Grok 4 – RL compute increased ten‑fold, markedly improving complex reasoning.

Grok 4 also integrates stronger tool use, allowing real‑time web access, calculator calls, and code execution environments.

Pricing and Availability

Grok 4 is accessible via API under version grok-4-0709 with two plans:

Standard – $30 per month or $300 per year.

Heavy – $300 per month or $3000 per year, aimed at high‑end developers and professional users.

The Heavy tier runs a multi‑agent system with ten‑times longer reasoning time, aggregating parallel agent results to select the best solution, achieving a high HLE score.

Roadmap

August: Grok 4 Code (programming‑enhanced version).

September: Multimodal agents.

October: Video generation model.

Conclusion

With its large context, multimodal fusion, and advanced tool integration, Grok 4 positions xAI alongside GPT‑5 and Claude 4 Opus in the top tier of large‑model competition, though pricing remains a factor for broader adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multimodal AI xAI industry analysis AI benchmarks Grok 4

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Key Highlights of Grok 4

Elon Musk’s Assessment

Benchmark Performance (SOTA Results)

Reasoning Mechanism Evolution

Pricing and Availability

Roadmap

Conclusion

Baobao Algorithm Notes

How this landed with the community

Was this worth your time?

0 Comments

Key Highlights of Grok 4