Chinese LLMs Surge Ahead: Token Usage Overtakes U.S. Models in 2026
In March 2026, OpenRouter recorded 9.55 trillion tokens consumed weekly, with Chinese models occupying six of the top‑10 slots, Qwen surpassing 1 billion downloads, and cost advantages that let domestic LLMs outpace U.S. counterparts in both performance and price.
1. Token Momentum
OpenRouter’s real‑time snapshot for the week of March 16 2026 shows total token consumption reaching 9.55 trillion tokens, with a weekly pace increase of +10.8 trillion . At this rate, the next week is projected to break the 10 trillion threshold.
2. Ranking Dominance
Among the top‑10 models by token volume, Chinese models hold six seats (Hunter Alpha, MiniMax M2.5, Step 3.5 Flash, DeepSeek V3.2, GLM‑5 Turbo, plus other Chinese contributions), while U.S. models occupy four. The leading U.S. model, Claude Sonnet 4.6 (518 B), trails the Chinese Step 3.5 Flash by 100 B.
In the top‑20, Chinese models claim seven positions, with MiniMax M2.5 leading at 7.98 trillion tokens (+304 % growth), followed by Step 3.5 Flash at 3.44 trillion (+807 %), Hunter Alpha at 1.57 trillion , and others such as MiMo‑V2‑Flash (763 B) and GLM‑5 (1.89 trillion, +142 %).
3. Open‑Source Ecosystem
By September 2025, Alibaba’s Qwen family overtook Meta’s Llama to become the most downloaded model series on Hugging Face, reaching over 1 billion downloads by January 2026. More than 200 000 derivative models have been built on Qwen, and Chinese‑origin bases account for 63 % of new fine‑tuned models (Stanford HAI, 2025‑09).
Developer download share shifted to 17.1 % for Chinese models versus 15.8 % for U.S. models (2024.8‑2025.8, Stanford HAI).
4. Development Milestones
DeepSeek R1, disclosed in January 2025, cost roughly $560 k to train, matching OpenAI o1 performance while being fully open‑source. Its reinforcement‑learning phase alone cost $29.4 k , using 512 H800 chips over 80 hours (Nature paper).
Qwen’s rapid ascent on Hugging Face displaced Llama, marking the first time Chinese developers’ download share exceeded that of the United States, and 63 % of new fine‑tuned models now rely on Chinese bases.
In February 2026, MiniMax M2.5, Kimi K2.5, and GLM‑5 launched concurrently, each climbing to the top of OpenRouter’s API usage within a single month, shifting from under 2 % to a global majority of calls.
5. Performance Landscape
While the top six positions remain dominated by U.S. closed‑source models, Chinese models begin to appear from rank 7 onward. GLM‑5 and MiniMax M2.7 tie at rank 7 with scores of 50, just seven points behind the leader. Xiaomi’s MiMo‑V2‑Pro sits at rank 10, Kimi K2.5 at 12, Alibaba’s Qwen 3.5‑397B at 16, and DeepSeek V3.2 at 21, yielding eight Chinese models in the top 21.
6. Cost Advantage
Open‑source AI startups running Chinese‑manufactured models benefit from lower operating costs. For example, MiniMax M2.7 is priced at $0.53 per million tokens , whereas Claude Opus 4.6 (max) costs $10.00 for the same token range—a nearly 20‑fold difference. Response latency for GPT‑5.4 averages 164 seconds, while leading Chinese models respond within 1‑4 seconds.
7. Emerging Scenarios
Programming‑related token usage on OpenRouter grew from 11 % at the start of 2025 to over 50 % , becoming the largest single usage category. Agent workflows now contribute more than half of total output tokens, aligning with the rapid adoption of Chinese models in these high‑demand scenarios.
8. Strategic Factors
Many Chinese models are released under permissive Apache 2.0 or MIT licenses, allowing unrestricted commercial use, modification, and redistribution. Simultaneously, U.S. chip export controls have spurred Chinese labs to adopt more compute‑efficient architectures such as Mixture‑of‑Experts (MoE), which deliver higher effective capacity with fewer resources, reducing both training and inference costs. These two forces together create a structural advantage for domestic models in the global market.
9. Sources
artificialanalysis.ai , pandaily.com , completeaitraining.com , the-decoder.com , aibase.ng , vertu.com , scmp.com .
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineer Programming
In the AI era, defining problems is often more important than solving them; here we explore AI's contradictions, boundaries, and possibilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
