Tagged articles
5 articles
Page 1 of 1
Machine Heart
Machine Heart
Jun 7, 2026 · Artificial Intelligence

FusionRoute: Token-Level Expert Routing and Self-Correction for Multi-LLM Collaboration

FusionRoute introduces a token‑level routing framework that dynamically selects the most suitable expert LLM for each token and adds a complementary generation step, enabling fine‑grained, stable multi‑model collaboration that outperforms existing sequence‑level and expert‑selection methods across diverse benchmarks.

AI researchexpert routinglarge language models
0 likes · 11 min read
FusionRoute: Token-Level Expert Routing and Self-Correction for Multi-LLM Collaboration
Machine Heart
Machine Heart
May 7, 2026 · Artificial Intelligence

OrthoReg: Simple Orthogonal Regularization to Eliminate Model Merging Conflicts

The paper introduces OrthoReg, a lightweight orthogonal regularization added during fine‑tuning that provably enforces weight orthogonality, thereby resolving conflicts in model merging and providing a theoretical explanation for the success of task arithmetic.

Deep LearningOrthoRegOrthogonal Regularization
0 likes · 12 min read
OrthoReg: Simple Orthogonal Regularization to Eliminate Model Merging Conflicts
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 18, 2026 · Artificial Intelligence

Model Ability Gets Squeezed Out in Multi‑Task Learning—How ESM Preserves It (CVPR 2026)

The paper reveals that multi‑task models suffer performance drops because tasks compete for the same internal subspace, and introduces Essential Subspace Merging (ESM) which separates critical directions and uses Polarized Scaling to keep multiple abilities stable, achieving significantly lower degradation than traditional baselines.

ESDESMMulti-Task Learning
0 likes · 16 min read
Model Ability Gets Squeezed Out in Multi‑Task Learning—How ESM Preserves It (CVPR 2026)
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Nov 3, 2025 · Artificial Intelligence

Smol Training Playbook: Secrets to Building World-Class LLMs

The article details the SmolLM3 3B‑parameter model, its architecture, dual‑mode inference, a three‑stage data‑curation strategy, rigorous ablation methods, preference optimisation (APO/DPO), model merging, and practical training‑stability tricks, offering a comprehensive guide for building high‑performing large language models.

APOLLM trainingcontext scaling
0 likes · 13 min read
Smol Training Playbook: Secrets to Building World-Class LLMs