Industry Insights 8 min read

Why Cutting Claude Subscriptions Won’t Fix Token Costs – Smarter Compute Is the Answer

Anthropic’s decision to block third‑party Agent frameworks from Claude’s subscription model exposes unsustainable token pricing, highlights massive compute waste caused by poor context handling, and argues that the real solution lies in smarter, more efficient agent design rather than cheaper tokens.

Machine Heart

Apr 6, 2026

Why Cutting Claude Subscriptions Won’t Fix Token Costs – Smarter Compute Is the Answer

In the emerging Agent era, Luo Fuli, head of Xiaomi’s MiMo large‑model team, argues that the core issue is not token price but the inefficiency of compute usage. Anthropic announced that Claude Pro and Max subscription users can no longer use their quota with third‑party Agent frameworks such as OpenClaw, forcing them to switch to pay‑per‑use APIs.

Claude Code’s subscription model is a well‑designed compute‑balancing system that likely operates at a loss unless API margins are ten to twenty times higher. Third‑party frameworks like OpenClaw trigger many low‑value tool calls per user request, each with a context window often exceeding 100 k tokens, leading to extreme waste.

When converted to API pricing, the real cost per request can be dozens of times the subscription price – a hidden pitfall.

Luo sees the short‑term pain for users forced onto usage‑based pricing as a catalyst for engineering discipline. Developers will be pressured to improve context management, increase prompt‑cache hit rates, and cut unnecessary token consumption.

Third‑party frameworks can still call Claude via API, but without the subscription “free ride.” The cost surge will drive better engineering practices.

She warns large‑model providers against a price war that sells cheap tokens while leaving doors open to wasteful agents, calling it a trap that Anthropic has just escaped.

Before slashing token prices, vendors should consider the downstream effects on users who waste time on low‑quality agents and unstable inference services, which harms user experience and retention.

Luo introduces Xiaomi’s new MiMo Token Plan, which supports third‑party framework integration but bills by token quota, mirroring Claude’s extra‑usage package. The plan emphasizes stable, high‑quality model delivery rather than impulsive payments.

MiMo Token Plan: token‑quota billing, third‑party access allowed, focused on long‑term stable service.

The article highlights three community takeaways:

It’s an AI economics rewrite, not just a pricing dispute. Unit cost depends on the combination of model, framework, and context management. Anthropic’s move creates natural selection pressure on Agent frameworks.

Compute waste, not token price, is the real problem. Inefficient framework design, massive context windows, and redundant calls burn money without delivering value.

Market淘汰已开始，但结局未定 (market淘汰已开始，但结局未定). Developers question whether third‑party frameworks can close the efficiency gap quickly enough or if users will revert to Claude’s native code.

Developers stress that clear token‑quota limits, rather than vague access rights, will foster better product behavior. The discussion signals a shift from “burning compute” toward “fine‑grained engineering architecture” across the AI software ecosystem.

Overall, Luo’s post serves as a forward‑looking signal about AI software engineering’s core pain points and the need for collaborative evolution between more efficient Agent frameworks and stronger models.

Agent Claude token economics Anthropic AI Pricing Compute Efficiency

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.