FusionRoute: Token-Level Expert Routing and Self-Correction for Multi-LLM Collaboration

FusionRoute introduces a token‑level routing framework that dynamically selects the most suitable expert LLM for each token and adds a complementary generation step, enabling fine‑grained, stable multi‑model collaboration that outperforms existing sequence‑level and expert‑selection methods across diverse benchmarks.

Machine Heart
Machine Heart
Machine Heart
FusionRoute: Token-Level Expert Routing and Self-Correction for Multi-LLM Collaboration

Recent advances in large language models (LLMs) show that scaling model size alone is insufficient; instead, many specialized expert models can be combined to handle sub‑tasks such as mathematical reasoning, code generation, and instruction following. FusionRoute proposes a token‑level collaboration paradigm that routes each generated token to the most appropriate expert.

The core of FusionRoute is a trainable router module that, for every decoding step, outputs a routing weight indicating which expert should generate the next token. In addition, the router produces router logits that serve as a complementary generation signal, which is merged with the selected expert’s logits to form the final token distribution. This design turns the router from a simple selector into an active participant that can correct expert outputs when they are weak.

Training proceeds in two stages. First, a supervised fine‑tuning (SFT) stage teaches the router to predict routing weights and generate complementary logits using a next‑token cross‑entropy loss. The loss retains only the “informative” token positions—where experts disagree—so the router learns to capture genuine ability differences. Second, a CDPO (complementary DPO) stage optimizes the router logits on preference data, treating the router’s log‑ratio as a bias term that grows when experts are strong and shrinks when they are weak, thereby focusing correction on expert failures.

FusionRoute’s mixed‑training strategy combines SFT and CDPO in a single pipeline, avoiding separate fine‑tuning of each expert. Because the router is lightweight and does not require gradient updates for the experts, heterogeneous models (e.g., Llama‑3 and Gemma‑2) can be assembled “plug‑and‑play”.

Extensive experiments on five benchmarks—GSM8K, MATH‑500, HumanEval, MBPP, and IfEval—show that FusionRoute consistently improves performance over baselines such as sequence‑level selection, token‑level selection without complement, model merging (DARE, TaskArithmetic), and directly fine‑tuned models. Notably, on domain‑specific tasks the method matches or exceeds the best expert, demonstrating true “strength‑by‑weakness” behavior. On a general‑purpose dataset (PerfectBlend), pairwise comparisons against GPT‑4o‑referenced fine‑tuned models reveal higher win rates, confirming superior overall generation quality.

Theoretical analysis proves that pure token‑level expert selection suffers from inherent non‑identifiability under a single‑policy coverage assumption, explaining instability in prior methods. Adding complementary logits expands the expressive policy class, allowing near‑optimal recovery under weaker assumptions.

From an engineering perspective, FusionRoute requires only a small router model, works with arbitrarily structured experts, and supports incremental addition of new experts without retraining the whole system. This makes it a practical, scalable solution for building robust multi‑LLM systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelsAI researchself-correctionmodel mergingexpert routingtoken-level collaboration
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.