When Stronger Developers Earn Less: Token‑Based Pricing Turns AI Coding Tools Upside‑Down
The article analyzes how recent price hikes and the shift from flat‑rate subscriptions to token‑based billing by GitHub Copilot, Alibaba Cloud, and other AI providers are forcing developers—especially the most productive ones—to face higher token consumption, diminishing marginal returns, and rising out‑of‑pocket costs.
GitHub Copilot adjustments
On 2026‑04‑20 GitHub announced four changes: (1) pause new registrations for Copilot Pro, Pro+ and student plans; (2) introduce session limits and weekly token limits to curb parallel long‑running sessions; (3) remove Opus 4.5 and 4.6 from Pro+, keeping only Opus 4.7; (4) switch billing from request‑based to token‑based.
Other providers’ pricing responses
Anthropic added usage caps, peak‑time routing, and identity verification.
Google tightened Gemini CLI limits and introduced tiered pricing.
OpenAI revised Codex pricing, raising costs for long‑context, multi‑round tool‑calling scenarios.
Chinese cloud price hikes (April–May 2026)
Alibaba Cloud AI compute products rose up to +34%; the 40 CNY/month Coding Plan Lite was discontinued.
Baidu Smart Cloud increased AI compute fees by 5‑30%.
Tencent Cloud applied a uniform +5% increase on 2026‑05‑09.
From Coding Plan to Token Plan
Coding Plan used a fixed monthly fee plus request or credit limits, which encouraged developers to “milk” AI assistance. The surge in large‑model usage in 2025 and the 2026 OpenClaw boom exposed a flaw: fixed fees cannot sustain high‑parallel, long‑session workloads, leading to losses as usage grows.
Token Plan meters consumption at the token level, aligning charges with actual resource use. In China, low electricity costs previously subsidised cheap compute, but that subsidy is no longer viable.
Demand‑supply context
IDC forecasts global annual token consumption to grow >3 × 10⁸‑fold by 2030. China’s daily token calls rose from 1 × 10¹² in early 2024 to 1 × 10¹⁴ by the end of 2025 (≈1 000‑fold). On the supply side, GPU chip capacity is constrained, with manufacturers prioritising large‑scale customers, causing data‑center project delays. Model providers such as Anthropic and OpenAI face pressure to reduce losses while scaling.
Developers’ cost pressures under Token Plan
Higher token consumption: complex logic, longer context, and frequent interactions directly increase token spend.
Diminishing marginal returns: per‑token billing offsets efficiency gains from faster coding; more usage leads to higher total cost.
Escalating out‑of‑pocket expenses: monthly spend can jump from tens of yuan to hundreds or thousands for heavy users.
Local deployment as a cost‑reduction path
Open‑source models such as Qwen‑3 and Gemma‑4 have approached the performance of cloud flagship models. For a workload of 1 M tokens per day, a self‑hosted solution can reduce monthly cost from several hundred yuan to near zero, with electricity as the primary expense. Limitations include smaller parameter counts, difficulty with ultra‑long context, multi‑step reasoning, and real‑time internet access, which still require occasional cloud calls.
Hybrid and cheaper‑model strategies
Using local models for routine completions and reserving cloud APIs for heavyweight tasks can cut up to 80 % of daily spend. Switching to domestic models (e.g., GLM, Qwen) that charge an order of magnitude less per token provides additional savings.
References
GitHub Copilot official announcement: https://github.blog/news-insights/company-news/changes-to-github-copilot-individual-plans/
V2EX discussion: https://www.v2ex.com/t/1207335
Alibaba Cloud Coding Plan Lite notice: https://www.aliyun.com/notice/118175
Google Gemini CLI policy: https://github.com/google-gemini/gemini-cli/discussions/23799
OpenAI Codex pricing update: https://www.openai.com/zh-Hans-CN/index/codex-flexible-pricing-for-teams/
Code example
来源丨
经授权转自
OSC开源社区(ID:oschina2013)
作者丨大东IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
