Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac
Redis founder antirez unveiled ds4.c, a pure‑C inference engine that leverages Objective‑C and Metal to run DeepSeek V4 locally on Mac devices, delivering about 27 token/s on an M3 Ultra—far slower than GPU servers but offering a dependency‑free, on‑device solution that keeps data private.
Project Overview
ds4.c (DeepSeek 4 in C) is a pure‑C inference engine for the DeepSeek V4 large language model that runs locally on macOS.
Source composition: ~55 % C, ~30 % Objective‑C, ~14 % Metal.
Architecture
C implements core inference logic, memory management, and data scheduling, providing zero‑runtime overhead.
Objective‑C bridges to Apple’s Metal API, enabling GPU offload.
Metal kernels (≈14 % of the code) perform matrix multiplication and attention calculations on Apple Silicon GPUs.
The project contains no external dependencies; JSON parsing, file I/O, and an HTTP client are implemented in‑house.
Build and Execution
Clone the repository, compile with the standard macOS toolchain, and run the binary. Repository URL: https://github.com/antirez/ds4. Example commands: git clone https://github.com/antirez/ds4.git then make (or Xcode build) as described in the README.
Performance Benchmarks
Third‑party measurements on Apple Silicon:
Mac M3 Max – 26.68 token/s
Mac M3 Ultra – 27.39 token/s
For reference, vLLM on a professional GPU server yields 150–200 token/s, but such servers cost tens of thousands of dollars and require queueing.
The M3 Ultra (≈¥30‑40 k) therefore provides a locally runnable solution sufficient for typical coding or content‑generation workloads, though the throughput is lower than high‑end GPU servers.
Running inference entirely on the device keeps prompts and documents on the local disk, eliminating network‑based data exposure.
Language Choice Rationale
Compared with alternatives:
Rust offers memory safety via ownership, but the abstraction limits byte‑level control required for maximum inference performance.
Python incurs high runtime overhead and many dependencies, making it unsuitable for low‑level inference.
C++ provides rich features but results in larger binaries and more complex builds; C is preferred for its simplicity, readability, and portability.
Projects such as llama.cpp use C++ and have a substantially larger codebase, whereas ds4.c remains concise and self‑contained.
Comparison with Other Engines
llama.cpp targets multiple models and hardware platforms, offering broader compatibility. ds4.c targets only DeepSeek V4 on macOS and prioritizes minimalism and fine‑grained control.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Lao Guo's Learning Space
AI learning, discussion, and hands‑on practice with self‑reflection
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
