How a Low‑Cost Model Combo Matches Claude Fable 5 Performance at Half the Price
OpenRouter’s Fusion of Kimi K2.6, DeepSeek V4 Pro and Gemini 3 Flash achieves near‑identical DRACO benchmark scores to Claude Fable 5 while cutting total inference cost by about 80%, demonstrating the strength of multi‑model collaboration and cost‑effective LLM deployment.
The article evaluates OpenRouter’s Fusion multi‑model approach, which distributes a single request to several LLMs (Kimi K2.6, DeepSeek V4 Pro, Gemini 3 Flash), lets each model retrieve web data, then a judging model aggregates the answers, resolves contradictions, and produces a final response.
To measure overall capability, the authors used the DRACO deep‑research benchmark suite created by Perplexity AI, containing 100 real‑world tasks across academia, finance, law, medicine and technology. The suite applies 39 weighted scoring criteria covering answer correctness, depth, formatting and citation, with penalties for errors, unsafe advice or nonsensical output.
Results show that the Fusion panel outperforms individual models. The Opus 4.8 + GPT‑5.5 combo scores 69.0%, exceeding the sum of its parts. Most notably, the Kimi K2.6 + DeepSeek V4 Pro + Gemini 3 Flash trio reaches a 64.7% score, only 0.6 percentage points below Claude Fable 5’s 65.3% (the latter completed 93 of 100 tasks due to safety filters).
Cost analysis based on official pricing reveals that Claude Fable 5 charges $10 per million input tokens and $50 per million output tokens, roughly double the rates of Opus 4.8. In contrast, DeepSeek V4 Pro costs $0.44 (input) and $0.87 (output) per million tokens; Gemini 3 Flash costs about $0.5 (input) and $3 (output); Kimi K2.6 uses a cache‑based model charging $0.95 per million input tokens for first use and $0.16 for repeated context, with $4 per million output tokens. Combined, the three‑model Fusion panel reduces total task cost by nearly 80% compared with Claude Fable 5.
The authors also observed that self‑fusion of a single model can boost performance: Opus 4.8’s score rises from 58.8% alone to 65.5% when run in a multi‑instance Fusion, indicating that the aggregation and reasoning steps themselves improve answer quality.
For practical use, OpenRouter offers both a web UI and an API. The web interface lets users enable preset panels with one click or customize model combinations, while the API requires specifying the desired model list in the request parameters.
Overall, the study demonstrates that multi‑model collaboration via Fusion can achieve performance parity with premium models like Claude Fable 5 while delivering substantially lower operating costs, making it an attractive option for enterprises and developers with high daily token volumes.
Reference links: https://openrouter.ai/blog/announcements/fusion-beats-frontier/ ; https://x.com/OpenRouter/status/2065856860435988482
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
