How a Low‑Cost Model Combo Matches Claude Fable 5 Performance at Half the Price

OpenRouter’s Fusion of Kimi K2.6, DeepSeek V4 Pro and Gemini 3 Flash achieves near‑identical DRACO benchmark scores to Claude Fable 5 while cutting total inference cost by about 80%, demonstrating the strength of multi‑model collaboration and cost‑effective LLM deployment.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
How a Low‑Cost Model Combo Matches Claude Fable 5 Performance at Half the Price

The article evaluates OpenRouter’s Fusion multi‑model approach, which distributes a single request to several LLMs (Kimi K2.6, DeepSeek V4 Pro, Gemini 3 Flash), lets each model retrieve web data, then a judging model aggregates the answers, resolves contradictions, and produces a final response.

To measure overall capability, the authors used the DRACO deep‑research benchmark suite created by Perplexity AI, containing 100 real‑world tasks across academia, finance, law, medicine and technology. The suite applies 39 weighted scoring criteria covering answer correctness, depth, formatting and citation, with penalties for errors, unsafe advice or nonsensical output.

Results show that the Fusion panel outperforms individual models. The Opus 4.8 + GPT‑5.5 combo scores 69.0%, exceeding the sum of its parts. Most notably, the Kimi K2.6 + DeepSeek V4 Pro + Gemini 3 Flash trio reaches a 64.7% score, only 0.6 percentage points below Claude Fable 5’s 65.3% (the latter completed 93 of 100 tasks due to safety filters).

Cost analysis based on official pricing reveals that Claude Fable 5 charges $10 per million input tokens and $50 per million output tokens, roughly double the rates of Opus 4.8. In contrast, DeepSeek V4 Pro costs $0.44 (input) and $0.87 (output) per million tokens; Gemini 3 Flash costs about $0.5 (input) and $3 (output); Kimi K2.6 uses a cache‑based model charging $0.95 per million input tokens for first use and $0.16 for repeated context, with $4 per million output tokens. Combined, the three‑model Fusion panel reduces total task cost by nearly 80% compared with Claude Fable 5.

The authors also observed that self‑fusion of a single model can boost performance: Opus 4.8’s score rises from 58.8% alone to 65.5% when run in a multi‑instance Fusion, indicating that the aggregation and reasoning steps themselves improve answer quality.

For practical use, OpenRouter offers both a web UI and an API. The web interface lets users enable preset panels with one click or customize model combinations, while the API requires specifying the desired model list in the request parameters.

Overall, the study demonstrates that multi‑model collaboration via Fusion can achieve performance parity with premium models like Claude Fable 5 while delivering substantially lower operating costs, making it an attractive option for enterprises and developers with high daily token volumes.

Reference links: https://openrouter.ai/blog/announcements/fusion-beats-frontier/ ; https://x.com/OpenRouter/status/2065856860435988482

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMCost Optimizationmodel benchmarkingClaude Fable 5OpenRouter Fusion
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.