Fine‑Tune Any Large Model on Apple Silicon with mlx‑tune

The article introduces mlx‑tune, a community project that wraps the MLX library with Unsloth's API to enable local fine‑tuning of large language, vision, TTS, STT, OCR, and embedding models on Apple Silicon Macs, outlines its workflow from prototype to cloud, provides installation steps, code examples, and discusses its capabilities and limitations.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
Fine‑Tune Any Large Model on Apple Silicon with mlx‑tune

Mac fine‑tuning limitation

Unsloth relies on Triton, which does not support macOS. Consequently, Mac users cannot run Unsloth locally and must either rent cloud GPUs for tiny experiments or rewrite code to use the native mlx‑lm API.

mlx‑tune solution

mlx‑tune (github.com/ARahim3/mlx-tune) wraps the MLX library with Unsloth’s API. A script written for a Mac can be run on a CUDA cluster by changing only the import statements.

# Unsloth (CUDA)                # MLX‑Tune (Apple Silicon)
from unsloth import FastLanguageModel   from mlx_tune import FastLanguageModel
from trl import SFTTrainer               from mlx_tune import SFTTrainer
# The rest of the code is identical!

Supported training methods

SFT : standard instruction fine‑tuning.

DPO / ORPO / KTO / SimPO : full coverage of preference‑learning methods.

GRPO : DeepSeek‑style multi‑generation with reward training.

CPT : continual pre‑training with decoupled learning rates.

Multimodal capabilities

Vision : fine‑tuning of Gemma 4, Qwen3.5, PaliGemma, LLaVA, Pixtral VLMs.

TTS : Orpheus, OuteTTS, Spark‑TTS, Sesame/CSM, Qwen3‑TTS.

STT : Whisper, Moonshine, Qwen3‑ASR, NVIDIA Canary, Voxtral.

Embedding : BERT, ModernBERT, Qwen3‑Embedding, Harrier (with contrastive learning).

OCR : DeepSeek‑OCR, GLM‑OCR, olmOCR, Qwen‑VL (built‑in CER/WER metrics).

Advanced features

MoE fine‑tuning : supports 39+ MoE architectures, including Qwen3.5‑35B, Mixtral, DeepSeek series.

Gemma 4 Audio : 12‑layer Conformer tower for native 16 kHz audio processing.

LFM2 : Liquid AI hybrid convolution + GQA architecture.

Installation

Recommended installer: uv.

# Standard install
uv pip install mlx-tune
# With audio support
uv pip install 'mlx-tune[audio]'
brew install ffmpeg

Minimal SFT example (4‑bit quantized Llama‑3.2)

from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
)

dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    tokenizer=tokenizer,
    args=SFTConfig(
        output_dir="outputs",
        per_device_train_batch_size=2,
        learning_rate=2e-4,
        max_steps=50,
    ),
)
trainer.train()
trainer.save_pretrained("lora_model")
trainer.save_pretrained_merged("merged", tokenizer)
trainer.save_pretrained_gguf("model", tokenizer)  # GGUF for Ollama

Vision fine‑tuning example

from mlx_tune import FastVisionModel, UnslothVisionDataCollator, VLMSFTTrainer
from mlx_tune.vlm import VLMSFTConfig

model, processor = FastVisionModel.from_pretrained("mlx-community/Qwen3.5-0.8B-bf16")
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers=True,
    finetune_language_layers=True,
    r=16,
    lora_alpha=16,
)
FastVisionModel.for_training(model)
trainer = VLMSFTTrainer(
    model=model,
    tokenizer=processor,
    data_collator=UnslothVisionDataCollator(model, processor),
    train_dataset=dataset,
    args=VLMSFTConfig(max_steps=30, learning_rate=2e-4),
)
trainer.train()

TTS fine‑tuning example

from mlx_tune import FastTTSModel, TTSSFTTrainer, TTSSFTConfig, TTSDataCollator
from datasets import load_dataset, Audio

model, tokenizer = FastTTSModel.from_pretrained("mlx-community/orpheus-3b-0.1-ft-bf16")
model = FastTTSModel.get_peft_model(model, r=16, lora_alpha=16)

dataset = load_dataset("MrDragonFox/Elise", split="train[:100]")
 dataset = dataset.cast_column("audio", Audio(sampling_rate=24000))

trainer = TTSSFTTrainer(
    model=model,
    tokenizer=tokenizer,
    data_collator=TTSDataCollator(model, tokenizer),
    train_dataset=dataset,
    args=TTSSFTConfig(output_dir="./tts_output", max_steps=60),
)
trainer.train()

Workflow overview

Same code base can be used for local prototyping on a Mac and for large‑scale training on a CUDA cluster.

Local Mac (mlx‑tune)          Cloud GPU (Unsloth)
├── Quick experiments          ├── Large‑scale training
├── Small dataset validation   ├── Full dataset
├── Seconds‑level iteration     ├── Production‑grade optimization
└── Same code ──────────────────└── Same code

Export formats

HuggingFace : standard checkpoint.

GGUF : directly usable by Ollama or llama.cpp.

push_to_hub : one‑click upload to HuggingFace Hub.

Limitations

Training speed is slower than on an A100 GPU with Unsloth, due to hardware constraints.

GGUF export has restrictions for quantized base models; non‑quantized models are recommended.

Memory is limited by the unified memory of the Mac (up to 512 GB on Mac Studio).

large language modelsmodel fine-tuningmultimodalApple Siliconmlx-tuneUnsloth API
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.