Grok 4 Unveiled: Why xAI Claims Its New Model Beats the Competition
On July 10, xAI launched Grok 4, a multimodal LLM with a 256K‑token context window, tool‑use upgrades and benchmark scores that surpass existing models, while pricing it at $30/month for the standard tier and $300/month for the heavy tier.
Key Highlights of Grok 4
All‑domain reasoning boost – many tests outperform current leading models.
256K‑token context window – dramatically larger than typical LLMs.
Multimodal interaction – supports voice, images, code and upcoming video capabilities.
Reinforcement‑learning + tool use – HLE score rises to 50.7%.
Future versions – programming model, multimodal agents, video generation.
Elon Musk’s Assessment
“Grok 4’s reasoning ability already exceeds humans; it can score full marks on the SAT and near‑perfect scores on GRE subjects.”
Musk added that the model could soon enable genuine scientific discoveries.
Benchmark Performance (SOTA Results)
HLE (Math, Chemistry, Logic) – highest 50.7% with tool use, beating the previous SOTA of 41.0%.
ARC‑AGI‑2 (High‑order reasoning) – 15.9%, roughly double the performance of Claude Opus and other commercial models.
AIME 25 (USAMO Invitational) – perfect 100, unmatched by peer models.
USAMO 25 (US Math Olympiad) – SOTA on top‑level high‑school problems.
LCB Programming Challenge – leading performance; upcoming Grok Code will further improve.
ARC‑AGI‑2 is a notoriously difficult benchmark from the ARC Prize Foundation, comparable to civil‑service aptitude tests, where even humans struggle.
Reasoning Mechanism Evolution
Grok 2 – traditional token prediction.
Grok 3 – introduced RL fine‑tuning for deeper reasoning.
Grok 4 – RL compute increased ten‑fold, markedly improving complex reasoning.
Grok 4 also integrates stronger tool use, allowing real‑time web access, calculator calls, and code execution environments.
Pricing and Availability
Grok 4 is accessible via API under version grok-4-0709 with two plans:
Standard – $30 per month or $300 per year.
Heavy – $300 per month or $3000 per year, aimed at high‑end developers and professional users.
The Heavy tier runs a multi‑agent system with ten‑times longer reasoning time, aggregating parallel agent results to select the best solution, achieving a high HLE score.
Roadmap
August: Grok 4 Code (programming‑enhanced version).
September: Multimodal agents.
October: Video generation model.
Conclusion
With its large context, multimodal fusion, and advanced tool integration, Grok 4 positions xAI alongside GPT‑5 and Claude 4 Opus in the top tier of large‑model competition, though pricing remains a factor for broader adoption.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
