Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 12, 2026 · Artificial Intelligence

How to Deploy MiniMax-M2.7 Quantized Models Locally on macOS and Linux

This guide explains the 22 GGUF quantized versions of MiniMax-M2.7 released by Unsloth, compares their accuracy and size, recommends the UD‑Q4_K_XL model for best quality‑to‑size trade‑off, and provides step‑by‑step instructions for local deployment via Unsloth Studio, llama.cpp, API server, or the MLX native solution, along with important pitfalls and performance‑tuning tips.

Dynamic 2.0Local DeploymentMLX
0 likes · 14 min read
How to Deploy MiniMax-M2.7 Quantized Models Locally on macOS and Linux
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 19, 2026 · Artificial Intelligence

Testing the Hot oMLX on Mac: Claude‑Opus‑4.6 Distilled and Qwen3.5‑9B Performance Review

The article evaluates oMLX, a Mac‑only LLM runtime built on Apple Silicon and MLX, by walking through installation, UI features, memory usage, single‑request speed, benchmark results for Claude‑Opus‑4.6 and Qwen3.5‑9B, continuous batch processing gains, Claude Code optimizations, multi‑model support, and the failure to run a 27B model.

Apple SiliconClaude OpusMLX
0 likes · 9 min read
Testing the Hot oMLX on Mac: Claude‑Opus‑4.6 Distilled and Qwen3.5‑9B Performance Review
AI Engineering
AI Engineering
Feb 27, 2026 · Artificial Intelligence

jina-grep: Adding Semantic Search Capabilities to Grep on Apple Silicon

Jina-grep is an open-source CLI that adds fast, MLX-powered semantic search to grep on Apple Silicon, offering three modes, sub-millisecond latency, high token throughput, and easy installation, making local code and log searching more accurate than keyword matching.

Apple SiliconCLIMLX
0 likes · 4 min read
jina-grep: Adding Semantic Search Capabilities to Grep on Apple Silicon
AI Engineering
AI Engineering
Feb 15, 2026 · Artificial Intelligence

Qwen3‑ASR Runs Natively on Apple Silicon via MLX for Full‑Speed Speech Recognition

A developer has re‑implemented the state‑of‑the‑art Qwen3‑ASR model in MLX, enabling native execution on Apple M1‑M4 chips with real‑time factors as low as 0.08, 4‑bit quantization speedups of 4.7×, multilingual support for 52 languages, and features such as word‑level timestamps and streaming transcription.

Apple SiliconMLXQuantization
0 likes · 5 min read
Qwen3‑ASR Runs Natively on Apple Silicon via MLX for Full‑Speed Speech Recognition
AI Engineering
AI Engineering
Jan 7, 2026 · Artificial Intelligence

Unsloth-MLX: Fine‑Tune LLMs on Mac and Seamlessly Move Code to Cloud GPUs

Unsloth‑MLX leverages Apple’s MLX framework to let Mac users with Apple Silicon fine‑tune large language models locally with a single import change, offering zero‑cost migration to cloud GPUs, supporting SFT, DPO, ORPO, GRPO training, and export to HuggingFace or GGUF formats.

Apple SiliconGPU cloudLLM fine-tuning
0 likes · 4 min read
Unsloth-MLX: Fine‑Tune LLMs on Mac and Seamlessly Move Code to Cloud GPUs
Architect's Alchemy Furnace
Architect's Alchemy Furnace
May 7, 2025 · Artificial Intelligence

Which LLM Inference Engine Reigns Supreme? A Deep Dive into Transformers, vLLM, Llama.cpp, SGLang, MLX and Ollama

This article provides a comprehensive comparison of seven popular large‑language‑model inference engines—Transformers, vLLM, Llama.cpp, SGLang, MLX, Ollama and others—detailing their core features, performance characteristics, hardware compatibility, concurrency support, and ideal use‑cases, plus practical installation guidance for Xinference.

InferenceLLMMLX
0 likes · 17 min read
Which LLM Inference Engine Reigns Supreme? A Deep Dive into Transformers, vLLM, Llama.cpp, SGLang, MLX and Ollama