Tagged articles

DS4

3 articles · Page 1 of 1

Jun 30, 2026 · Artificial Intelligence

Running DeepSeek V4 on M5 Max: 5 tps Speedup Without Large Memory

Developer Anemll demonstrates that the DS4 IQ2_Q2 version of DeepSeek V4 on an Apple M5 Max gains a 5‑tps throughput boost, using SSD‑streamed MoE sidecar loading to run large models without requiring high memory, and provides full build and execution instructions.

AI inferenceApple SiliconDS4

0 likes · 8 min read

Running DeepSeek V4 on M5 Max: 5 tps Speedup Without Large Memory

Java Architect Essentials

May 29, 2026 · Artificial Intelligence

How Redis Creator Built a Metal‑Only Engine to Run DeepSeek V4 Flash at Full Speed on Mac

The ds4.c project, authored by Redis founder Salvatore Sanfilippo, is a Metal‑only C inference engine that uses asymmetric 2‑bit quantization, disk‑based KV caching, and OpenAI/Anthropic‑compatible APIs to achieve usable performance for DeepSeek V4 Flash on high‑end Apple Silicon Macs.

Apple SiliconC#DS4

0 likes · 9 min read

How Redis Creator Built a Metal‑Only Engine to Run DeepSeek V4 Flash at Full Speed on Mac

Old Zhang's AI Learning

May 17, 2026 · Artificial Intelligence

Why DeepSeek V4 Flash’s Quantized Model Is Gaining Traction

The DeepSeek V4 Flash quantized GGUF model and the dedicated ds4 inference engine, both released by antirez, offer dramatically reduced activation parameters, massive 1‑million‑token context windows, aggressive KV‑cache compression and hardware‑specific quantizations that enable smooth local inference on high‑memory Macs and CUDA machines, while sacrificing generality for performance.

DS4DeepSeek V4 FlashGGUF

0 likes · 11 min read

Why DeepSeek V4 Flash’s Quantized Model Is Gaining Traction

DS4

Running DeepSeek V4 on M5 Max: 5 tps Speedup Without Large Memory

How Redis Creator Built a Metal‑Only Engine to Run DeepSeek V4 Flash at Full Speed on Mac

Why DeepSeek V4 Flash’s Quantized Model Is Gaining Traction

Running DeepSeek V4 on M5 Max: 5 tps Speedup Without Large Memory