Tagged articles

Model Merging

6 articles · Page 1 of 1

Jun 17, 2026 · Artificial Intelligence

How a 1.5B Parameter Model Can Add External Knowledge to Any Frozen LLM

The article analyzes MEMO, a framework that equips a frozen large language model with a lightweight 1.5B‑parameter memory model fine‑tuned on a target corpus, detailing its architecture, five‑step data synthesis pipeline, structured inference protocol, experimental advantages over RAG and fine‑tuning, as well as its limitations and future research directions.

Fine-tuningKnowledge IntegrationLLM

0 likes · 19 min read

How a 1.5B Parameter Model Can Add External Knowledge to Any Frozen LLM

Machine Heart

Jun 7, 2026 · Artificial Intelligence

FusionRoute: Token-Level Expert Routing and Self-Correction for Multi-LLM Collaboration

FusionRoute introduces a token‑level routing framework that dynamically selects the most suitable expert LLM for each token and adds a complementary generation step, enabling fine‑grained, stable multi‑model collaboration that outperforms existing sequence‑level and expert‑selection methods across diverse benchmarks.

AI researchModel Mergingexpert routing

0 likes · 11 min read

FusionRoute: Token-Level Expert Routing and Self-Correction for Multi-LLM Collaboration

Machine Heart

May 7, 2026 · Artificial Intelligence

OrthoReg: Simple Orthogonal Regularization to Eliminate Model Merging Conflicts

The paper introduces OrthoReg, a lightweight orthogonal regularization added during fine‑tuning that provably enforces weight orthogonality, thereby resolving conflicts in model merging and providing a theoretical explanation for the success of task arithmetic.

Deep LearningModel MergingOrthoReg

0 likes · 12 min read

OrthoReg: Simple Orthogonal Regularization to Eliminate Model Merging Conflicts

Machine Learning Algorithms & Natural Language Processing

Apr 18, 2026 · Artificial Intelligence

Model Ability Gets Squeezed Out in Multi‑Task Learning—How ESM Preserves It (CVPR 2026)

The paper reveals that multi‑task models suffer performance drops because tasks compete for the same internal subspace, and introduces Essential Subspace Merging (ESM) which separates critical directions and uses Polarized Scaling to keep multiple abilities stable, achieving significantly lower degradation than traditional baselines.

ESDESMModel Merging

0 likes · 16 min read

Model Ability Gets Squeezed Out in Multi‑Task Learning—How ESM Preserves It (CVPR 2026)

AI2ML AI to Machine Learning

Nov 3, 2025 · Artificial Intelligence

Smol Training Playbook: Secrets to Building World-Class LLMs

The article details the SmolLM3 3B‑parameter model, its architecture, dual‑mode inference, a three‑stage data‑curation strategy, rigorous ablation methods, preference optimisation (APO/DPO), model merging, and practical training‑stability tricks, offering a comprehensive guide for building high‑performing large language models.

APOLLM trainingModel Merging

0 likes · 13 min read

Smol Training Playbook: Secrets to Building World-Class LLMs

Baobao Algorithm Notes

Apr 16, 2024 · Artificial Intelligence

Merging Large Language Models Without GPUs: Task Vector, SLERP, TIES & DARE Explained

This article introduces four advanced model‑merging algorithms—Task Vector, SLERP, TIES, and DARE—explains their underlying principles, compares their strengths, and demonstrates a practical merge of Mistral‑7B, WizardMath‑7B and CodeLlama‑7B using the open‑source MergeKit toolkit.

AIDAREMergeKit

0 likes · 10 min read

Merging Large Language Models Without GPUs: Task Vector, SLERP, TIES & DARE Explained