Tagged articles

DS4

3 articles · Page 1 of 1
AI Engineering
AI Engineering
Jun 30, 2026 · Artificial Intelligence

Running DeepSeek V4 on M5 Max: 5 tps Speedup Without Large Memory

Developer Anemll demonstrates that the DS4 IQ2_Q2 version of DeepSeek V4 on an Apple M5 Max gains a 5‑tps throughput boost, using SSD‑streamed MoE sidecar loading to run large models without requiring high memory, and provides full build and execution instructions.

AI inferenceApple SiliconDS4
0 likes · 8 min read
Running DeepSeek V4 on M5 Max: 5 tps Speedup Without Large Memory
Old Zhang's AI Learning
Old Zhang's AI Learning
May 17, 2026 · Artificial Intelligence

Why DeepSeek V4 Flash’s Quantized Model Is Gaining Traction

The DeepSeek V4 Flash quantized GGUF model and the dedicated ds4 inference engine, both released by antirez, offer dramatically reduced activation parameters, massive 1‑million‑token context windows, aggressive KV‑cache compression and hardware‑specific quantizations that enable smooth local inference on high‑memory Macs and CUDA machines, while sacrificing generality for performance.

DS4DeepSeek V4 FlashGGUF
0 likes · 11 min read
Why DeepSeek V4 Flash’s Quantized Model Is Gaining Traction