What 2025 Tells Us About the Future of Large Language Models
The 2025 LLM year‑in‑review highlights paradigm shifts such as RLVR training, uneven “saw‑tooth” intelligence, the rise of Cursor‑style applications, Claude Code agents running locally, Vibe Coding, and the Nano Banana GUI revolution, concluding that current models only exploit about 10 % of their potential.
Karpathy recently published a year‑in‑review of large language model (LLM) progress in 2025, noting several surprising paradigm shifts.
RLVR training : uses verifiable rewards to let LLMs autonomously evolve reasoning abilities, consuming pre‑training compute.
Saw‑tooth intelligence : models excel at math and code but behave like "elementary‑school" level in common‑sense tasks, showing highly uneven capabilities.
Application‑layer rise : the Cursor paradigm demonstrates that vertically‑oriented LLM applications can deliver more commercial value than generic models.
Local agents : Claude Code runs as a “computer‑resident spirit”, leveraging private data and low‑latency interaction.
Vibe Coding : code becomes a free, temporary, plastic medium, turning programming into a creative activity for a broader audience.
GUI revolution (Nano Banana) : text‑only interaction is giving way to visual, image‑based, and multimodal interfaces, akin to the shift from CLI to GUI in personal computing.
1. RLVR – Reinforcement Learning with Verifiable Rewards
Traditional LLM production pipelines consist of pre‑training (≈GPT‑2/3 era), supervised fine‑tuning (≈InstructGPT), and RLHF. In 2025, RLVR emerges as a new stage: LLMs are trained in environments with automatically verifiable rewards (e.g., math or coding puzzles). This longer‑duration optimization consumes the compute previously allocated to pre‑training and yields a “long‑reasoning‑chain” control knob that can extend reasoning depth and thinking time.
Reference: 375 papers on post‑training techniques for reasoning LLMs (URL: https://mp.weixin.qq.com/s?__biz=Mzk0MTYzMzMxMA==∣=2247493398&idx=1&sn=1b09dc2bcac146d1283a2102b49642b6#wechat_redirect)
2. Ghost vs. Animal – Saw‑tooth Intelligence
The author likens LLMs to “summoned ghosts” rather than cultivated animals. Their training objectives (imitating text, solving puzzles, gaining up‑votes) differ fundamentally from human survival‑oriented cognition, resulting in a mix of encyclopedic brilliance and naïve, “elementary‑school” mistakes.
Benchmarks are increasingly unreliable because they can be gamed with RLVR or synthetic data, producing sharp spikes in specific abilities without genuine general intelligence. Even when a model dominates benchmarks, it may still be far from AGI.
3. Cursor – A New LLM Application Layer
Cursor exemplifies a vertical LLM application that bundles and orchestrates model calls for specific domains. Its characteristics include:
Context engineering to shape model behavior.
Backend DAG orchestration that balances performance and cost.
Domain‑specific GUIs for end‑users.
Adjustable autonomy sliders for controllable output.
More details: https://mp.weixin.qq.com/s?__biz=Mzk0MTYzMzMxMA==∣=2247490759&idx=1&sn=5193358a090c71b49a6474fb4514ec35#wechat_redirect
4. Claude Code – AI Living on Your Computer
Claude Code (CC) is the first convincing LLM agent that runs locally, using the user’s private environment, data, and context. It chains tool use and reasoning in a loop to solve complex problems.
Key distinction: the value lies not in where the AI runs (cloud vs. local) but in the already‑running computer’s installed software, data, keys, and low‑latency interaction.
Technical stack overview (URL): https://mp.weixin.qq.com/s?__biz=Mzk0MTYzMzMxMA==∣=2247495528&idx=2&sn=cb7dd781b9c132254af9946a25e20b93#wechat_redirect
Example command: localhost Anthropic packaged CC as a minimal CLI, turning AI from a web service into a “computer‑resident spirit”.
5. Vibe Coding – Programming as a Flexible Medium
Vibe Coding describes a future where code becomes a free, temporary, and plastic medium, allowing anyone to create software without deep expertise. It enables rapid prototyping, bug‑hunting, and the creation of disposable applications, reshaping the software development landscape.
Comprehensive review (URL): https://mp.weixin.qq.com/s?__biz=Mzk0MTYzMzMxMA==∣=2247498057&idx=1&sn=04ef65879b687f4a196581d93087d014#wechat_redirect
6. Nano Banana – The LLM GUI Revolution
Google’s Gemini Nano Banana model exemplifies the next core computing paradigm: moving from text‑only interaction to visual, image‑rich, and multimodal interfaces. Just as personal computers evolved from CLI to GUI, LLMs are transitioning to “LLM‑GUI” where text is the machine’s native language but not the user’s preferred medium.
Unified multimodal capabilities combine text, image, and world knowledge within model weights.
Further reading (URL): https://mp.weixin.qq.com/s?__biz=Mzk0MTYzMzMxMA==∣=2247501039&idx=1&sn=6912c0321f6ee917d5ffd1cf44ecb2c5#wechat_redirect
Conclusion
2025 is an exciting, surprising year for LLMs. They behave both smarter and dumber than expected, yet remain vastly under‑utilized—only about 10 % of their potential is being tapped. The industry’s continued evolution promises deeper reasoning, richer multimodal interaction, and more practical, locally‑run agents.
https://karpathy.bearblog.dev/year-in-review-2025/How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
