Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

Redis founder antirez unveiled ds4.c, a pure‑C inference engine that leverages Objective‑C and Metal to run DeepSeek V4 locally on Mac devices, delivering about 27 token/s on an M3 Ultra—far slower than GPU servers but offering a dependency‑free, on‑device solution that keeps data private.

Lao Guo's Learning Space
Lao Guo's Learning Space
Lao Guo's Learning Space
Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

Project Overview

ds4.c (DeepSeek 4 in C) is a pure‑C inference engine for the DeepSeek V4 large language model that runs locally on macOS.

Source composition: ~55 % C, ~30 % Objective‑C, ~14 % Metal.

Architecture

C implements core inference logic, memory management, and data scheduling, providing zero‑runtime overhead.

Objective‑C bridges to Apple’s Metal API, enabling GPU offload.

Metal kernels (≈14 % of the code) perform matrix multiplication and attention calculations on Apple Silicon GPUs.

The project contains no external dependencies; JSON parsing, file I/O, and an HTTP client are implemented in‑house.

Build and Execution

Clone the repository, compile with the standard macOS toolchain, and run the binary. Repository URL: https://github.com/antirez/ds4. Example commands: git clone https://github.com/antirez/ds4.git then make (or Xcode build) as described in the README.

Performance Benchmarks

Third‑party measurements on Apple Silicon:

Mac M3 Max – 26.68 token/s

Mac M3 Ultra – 27.39 token/s

For reference, vLLM on a professional GPU server yields 150–200 token/s, but such servers cost tens of thousands of dollars and require queueing.

The M3 Ultra (≈¥30‑40 k) therefore provides a locally runnable solution sufficient for typical coding or content‑generation workloads, though the throughput is lower than high‑end GPU servers.

Running inference entirely on the device keeps prompts and documents on the local disk, eliminating network‑based data exposure.

Language Choice Rationale

Compared with alternatives:

Rust offers memory safety via ownership, but the abstraction limits byte‑level control required for maximum inference performance.

Python incurs high runtime overhead and many dependencies, making it unsuitable for low‑level inference.

C++ provides rich features but results in larger binaries and more complex builds; C is preferred for its simplicity, readability, and portability.

Projects such as llama.cpp use C++ and have a substantially larger codebase, whereas ds4.c remains concise and self‑contained.

Comparison with Other Engines

llama.cpp targets multiple models and hardware platforms, offering broader compatibility. ds4.c targets only DeepSeek V4 on macOS and prioritizes minimalism and fine‑grained control.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceAICDeepSeekMetalMacInference Engine
Lao Guo's Learning Space
Written by

Lao Guo's Learning Space

AI learning, discussion, and hands‑on practice with self‑reflection

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.