Artificial Intelligence 14 min read

Why GPT‑5 Lost 72% While Chinese AI Models Gained 32% in the NOF1.AI Alpha Arena

The NOF1.AI Alpha Arena benchmark shows Chinese models like Qwen3 Max and DeepSeek out‑performing GPT‑5, delivering +32.42% and +22.46% returns respectively, while GPT‑5 suffers a -72.49% loss, highlighting the impact of trade frequency, risk control, and profit‑to‑loss ratios in AI‑driven crypto trading.

ShiZhen AI

Oct 24, 2025

Why GPT‑5 Lost 72% While Chinese AI Models Gained 32% in the NOF1.AI Alpha Arena

Overview of NOF1.AI Alpha Arena

NOF1.AI Alpha Arena is the first benchmark that evaluates AI agents with real capital ($10,000 per model) in live cryptocurrency perpetual contracts on Hyperliquid. The competition measures risk‑adjusted returns, transparency, and autonomous decision‑making.

Current Rankings (as of 2025‑10‑23 18:00)

🥇 Qwen3 Max (Alibaba) : $13,242 account, +32.42% return, 22 trades, win rate 31.8%, Sharpe 0.030.

🥈 DeepSeek Chat V3.1 : $12,246 account, +22.46% return, 9 trades, win rate 11.1%, Sharpe 1.059.

🥉 Claude Sonnet 4.5 (Anthropic) : $8,845 account, -11.55% return, 12 trades, win rate 16.7%, Sharpe 0.090.

4️⃣ Grok 4 (xAI/Tesla): $8,338 account, -16.62% return, 12 trades, win rate 8.3%, Sharpe 0.396.

5️⃣ Gemini 2.5 Pro (Google): $3,832 account, -61.68% return, 105 trades, win rate 26.7%, Sharpe -1.036, fees $908.27.

6️⃣ GPT‑5 (OpenAI) : $2,751 account, -72.49% return, 44 trades, win rate 4.5%, Sharpe -0.835, fees $293.79.

Model‑by‑Model Analysis

🏅 Champion – Qwen3 Max

Key performance: highest absolute profit (+$3,242) and best return. Strategy features include:

Trend capture: correctly rode BTC and ETH major moves.

Position management: max single‑trade profit $1,453, max loss $586.

Profit‑to‑loss ratio ≈2.5:1, allowing a modest 31.8% win rate to still generate the top return.

🥈 Runner‑up – DeepSeek Chat V3.1

Delivered a solid +22.46% with only nine trades, demonstrating that low‑frequency, high‑quality trades can outperform higher‑frequency approaches. Notable points:

Sharpe ratio 1.059 – best risk‑adjusted performance.

Fees only $136.60, reflecting efficient trade sizing.

Win rate 11.1% but each winning trade contributed significantly to overall profit.

🥉 Claude Sonnet 4.5

Loss narrowed to -11.55% (improved from -12.39%). Largest single profit $1,807, but also a largest loss of $1,579, indicating volatile trade execution and a Sharpe near zero (0.090).

Grok 4

Mid‑range performance with -16.62% loss, lowest win rate (8.3%) among non‑failing models, but relatively stable Sharpe (0.396). Overtrading (12 trades) contributed to higher fees.

Gemini 2.5 Pro

Severe underperformance: -61.68% loss, 105 trades, highest fee $908.27, and negative Sharpe (-1.036). Overtrading and poor profit‑to‑loss balance were identified as primary failure factors.

GPT‑5

Worst outcome: -72.49% loss, 44 trades, win rate 4.5%, Sharpe -0.835. Failure analysis shows:

Extremely poor direction judgment – >95% of trades were wrong.

Frequent reverse‑trend positions (shorting BTC, ETH, SOL during uptrends).

Lack of stop‑loss discipline – losses grew unchecked (max loss $621 vs max profit $265).

Market Environment

During the competition period, major cryptocurrencies displayed high volatility and unclear trends, favoring short‑term, precise entry/exit timing. Leverage in perpetual contracts amplified both gains and losses.

Key Insights and Conclusions

Low‑frequency, high‑quality trades (DeepSeek) achieve superior risk‑adjusted returns.

Effective trend capture (Qwen3) combined with disciplined position sizing yields the highest absolute profit.

Sharpe ratio proved more indicative of sustainable performance than raw win rate.

Overtrading (Gemini) leads to excessive fees and decision fatigue.

Models with sub‑5% win rates (GPT‑5) lack reliable market judgment and stop‑loss mechanisms.

Future Directions

Improving AI trading agents should focus on:

Stricter risk management – robust stop‑loss and position sizing.

Incorporating on‑chain data and sentiment analysis for better market context.

Multi‑time‑frame signal integration to adapt to both short‑term spikes and longer trends.

Adaptive strategies that switch styles based on volatility regimes.

The Alpha Arena benchmark demonstrates that AI can profit in real markets, but success depends heavily on risk control, trade discipline, and model‑specific strategy design.

risk management DeepSeek model comparison cryptocurrency Qwen3 Alpha Arena AI trading

Written by

ShiZhen AI

Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.