The Future of AI Agents: From Prompt‑Driven Workflows to Model‑as‑Product and Reinforcement‑Learning‑Powered Agents
The article argues that the next wave of AI agents will shift from brittle, prompt‑driven workflows like Manus to truly autonomous, model‑centric agents trained with reinforcement learning and reasoning, exemplified by OpenAI's DeepResearch and Anthropic's Claude Sonnet 3.7, while the API‑driven market model collapses.
Alexander Doria emphasizes that the future direction of AI agents lies in improving the model itself rather than relying on pre‑designed workflows; he cites Manus as a short‑term but ultimately limited example of prompt‑driven agents that cannot handle long‑term planning or multi‑step reasoning.
The next generation of LLM agents will combine reinforcement learning (RL) with reasoning, as demonstrated by OpenAI's DeepResearch and Anthropic's Claude Sonnet 3.7, enabling autonomous task execution, dynamic planning, and tool selection without external prompts.
Recent trends show that generic model scaling faces diminishing returns and rising compute costs, while opinionated, task‑specific training (RL + reasoning) yields outsized performance gains, especially in specialized domains such as mathematics, code generation, and complex search.
DeepResearch is a research‑focused language model that internally performs web browsing, search, and report generation without external APIs, whereas other “search‑enhanced” products like Perplexity or Google’s Gemini rely on shallow integrations and lack transparent evaluation.
Anthropic defines an agent as a model that dynamically decides its own execution flow and tool usage, a capability that many current “agent” companies lack, as they still operate on static workflow pipelines.
The article predicts that within 2‑3 years closed‑source AI providers will stop offering APIs and instead sell the model itself as the product, ending the current API‑economy and forcing application‑layer companies to either train their own models or become obsolete.
Reinforcement learning’s market potential is severely undervalued; despite breakthroughs, investment in RL‑driven model training remains scarce, even as firms like Prime Intellect, EleutherAI, Jina, and HuggingFace’s training teams push the frontier.
Scaling RL‑based agents for tasks like web search requires massive simulated environments, synthetic data pipelines, and efficient reward engineering (e.g., GRPO), but the computational cost is shifting from GPU cycles to data bandwidth and environment simulation.
Ultimately, true agents that do not depend on handcrafted prompts will transform search, networking, finance, and other domains by autonomously planning, executing, and learning from multi‑step interactions, heralding a paradigm shift from application‑centric AI to model‑centric AI.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.