10 min read

Merging Large Language Models Without GPUs: Task Vector, SLERP, TIES & DARE Explained

This article introduces four advanced model‑merging algorithms—Task Vector, SLERP, TIES, and DARE—explains their underlying principles, compares their strengths, and demonstrates a practical merge of Mistral‑7B, WizardMath‑7B and CodeLlama‑7B using the open‑source MergeKit toolkit.

Baobao Algorithm Notes

Apr 16, 2024

Merging Large Language Models Without GPUs: Task Vector, SLERP, TIES & DARE Explained

Model merging overview

Model merging combines multiple pretrained language models into a single model, preserving quality while adding new capabilities. The process can be performed on CPU only, without additional fine‑tuning.

Task Vector

Task Vector treats directions in a model’s weight space as vectors that encode improvements for a specific task. Adding or subtracting these vectors edits model behavior efficiently, enabling performance gains, bias reduction, and knowledge injection without full fine‑tuning.

Paper:

https://arxiv.org/abs/2212.04089

SLERP

SLERP (Spherical Linear Interpolation) interpolates between two model weight vectors on the unit sphere, preserving each parent’s unique features and curvature.

Smooth transition between parameters.

Feature preservation for both models.

Geometric‑aware mixing that respects vector rotation.

SLERP workflow:

Normalize input vectors to unit length, focusing on direction.

Compute the angle between vectors via dot product and derive a scaling factor from the interpolation coefficient.

Weight and sum the original vectors with the scaling factor to obtain the interpolated vector.

Code repository:

https://github.com/Digitous/LLM-SLERP-Merge

TIES

TIES mitigates parameter interference that degrades performance when merging many models. It performs three operations:

Reset parameters that changed only slightly during fine‑tuning, reducing redundancy.

Resolve sign conflicts across models.

Merge only parameters whose signs agree with the consensus.

Paper:

https://arxiv.org/abs/2306.01708

DARE

DARE (Delta‑Aware Re‑scaling) extends TIES to merge models without extra training or GPU usage. It adds:

Delta‑parameter pruning: set to zero the majority of delta parameters (differences between fine‑tuned and pretrained weights) with minimal impact.

Weight re‑scaling: adjust merged weights to keep output expectations roughly unchanged.

DARE workflow:

Prune fine‑tuned weights back to their original pretrained values.

Average parameters from multiple models to create a unified model.

Re‑scale the merged weights to preserve expected performance.

Paper:

https://arxiv.org/abs/2311.03099

Merge demonstration with MergeKit

Installation

python3 -m pip install --upgrade pip
git clone https://github.com/cg123/mergekit.git
cd mergekit && pip install -q -e .

YAML configuration for merging Mistral‑7B, WizardMath‑7B, and CodeLlama‑7B using the TIES method

models:
  - model: mistralai/Mistral-7B-v0.1
  - model: WizardLM/WizardMath-7B-V1.0
    parameters:
      density: 0.5
      weight:
        - filter: mlp
          value: 0.5
        - value: 0
  - model: codellama/CodeLlama-7b-Instruct-hf
    parameters:
      density: 0.5
      weight: 0.5
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
  normalize: true
  int8_mask: true
dtype: float16

Running the merge

mergekit-yaml ultra_llm_merged.yaml output_folder \
    --allow-crimes \
    --copy-tokenizer \
    --out-shard-size 1B \
    --low-cpu-memory \
    --write-model-card \
    --lazy-unpickle

Resource usage on a 30‑vCPU machine (values may vary with model size):

Download: ~5 minutes

Merge: ~7 minutes

Peak memory: 30 GB

MergeKit repository:

https://github.com/cg123/mergekit

AI DARE MergeKit model merging SLERP Task Vector TIES

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.