Why Tokens Have Turned Into ‘Gold Water’: How Model Providers Are Raking in AI Profits
The AI ecosystem is witnessing a massive shift where exploding hardware costs and soaring token demand have turned tokens into a high‑margin commodity, allowing model providers to capture most of the profit while Nvidia and TSMC keep prices flat for strategic reasons.
1. Hardware price surge and model factories profit
In recent months the AI sector has entered a frenzy: memory prices have jumped six‑fold and Micron and Western Digital stocks rose over 200%, while H100 rental rates surged 40% as new cloud providers scramble for capacity. TSMC’s 3 nm capacity is fully booked and DRAM utilization exceeds 90%.
Despite this, the biggest profit makers are not the hardware sellers but the model providers. Anthropic’s annual ARR leapt from $9 billion to $44 billion, with inference gross margin climbing from 38 % to over 70 %. Customers spend heavily on GPUs and memory, yet the tokens generated are sold back at a huge markup.
2. Why tokens have become “gold water”
By the end of 2025, AI is expected to reach a wealth inflection point as Agent AI becomes widely usable, turning routine tasks into code‑writing, reporting, and white‑collar work that can be completed for a few dollars worth of tokens in minutes. Enterprises may spend $10 million annually on tokens to gain a competitive edge.
Model providers employ several profit‑maximizing tactics:
Input‑output ratios of 300:1 with cache hit rates above 90 %, driving costs extremely low.
Million‑token production costs as low as $0.99, while selling them for $5–$25, creating massive margins.
Premium offerings such as Opus Fast and Mythos are priced 5–6× higher, yet enterprises continue to purchase them.
Thus, token costs have plummeted while their market value has exploded, allowing model factories to harvest the majority of industry profits.
3. The puzzling behavior of Nvidia and TSMC
Logically, with AI demand skyrocketing, upstream hardware vendors should raise prices. However, Nvidia’s Blackwell chips deliver dozens‑fold performance gains yet remain modestly priced, and TSMC’s 3 nm capacity, despite being fully booked, sees little price increase.
Reasons cited include:
Fear of antitrust scrutiny, as AI compute becomes a “central bank” of the industry.
A strategy to nurture the ecosystem by keeping downstream margins high, thereby expanding the overall market.
Short‑term supply constraints make price hikes risky, potentially alienating customers.
Consequently, model providers reap the bulk of profits while Nvidia and TSMC appear as “philanthropists” in the short term.
4. The ultimate truth: value migration to model providers
From 2023 to 2025, most AI spending is absorbed by compute, power, and memory costs. After 2026, value will flow almost entirely to model providers.
Hardware performance is soaring, diluting cost per token (Blackwell generates 30× more tokens than H100; software optimizations add another 14× throughput).
Agent AI drives exponential token demand, creating a supply bottleneck that gives model providers absolute pricing power; open‑source models cannot compete.
Closed‑source models maintain a deep moat, outclassing open‑source alternatives in real‑world scenarios, while compute scarcity prevents price wars.
5. Summary of the AI profit landscape
Hardware buyers bear rising costs, electricity, and space requirements, earning modest returns.
Hardware sellers make small profits but avoid large margins.
Model factories act as “cash‑cow” token printers, capturing the vast majority of profits.
The most blatant truth in today’s AI sector is that while hardware prices soar and compute capacity is fiercely contested, the real quiet profiteers are the companies that sell models and tokens.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
