Artificial Intelligence 5 min read

Can Apple’s M3 Ultra Mac Studio Run Full‑Scale DeepSeek R1 at 11 Tokens/s?

Early adopters benchmarked the M3 Ultra‑powered Mac Studio running the 671‑billion‑parameter DeepSeek R1 model, achieving around 11 tokens per second in practice (up to 20 tokens/s theoretically), and compared its performance and cost against GPU‑based solutions and the newer M4 Max hardware.

Java Tech Enthusiast

Mar 18, 2025

Can Apple’s M3 Ultra Mac Studio Run Full‑Scale DeepSeek R1 at 11 Tokens/s?

Test Configuration

M3 Ultra Mac Studio: 32‑core CPU, 80‑core GPU, 32‑core Neural Engine

512 GB unified memory

1 TB SSD

Two units interconnected via Thunderbolt 5 (80 Gbps) using an EXO Labs device

Performance Results

Running the full 671 B DeepSeek R1 model (8‑bit quantization) on the two‑node setup achieved 11 tokens/second in real‑world inference, with a measured theoretical ceiling of about 20 tokens/second .

Quantization impact:

8‑bit: 9 – 21 tokens/s (observed range across community tests)

4‑bit: 16 – 18 tokens/s

Cost Comparison

Each configured Mac Studio costs ¥74,249, so the dual‑node system is roughly ¥150,000. Prior to the M3 Ultra, achieving comparable local inference required six to seven NVIDIA A100 GPUs, costing close to ¥1 million.

Community Benchmarks

Ollama GGUF format: 15.78 tokens/s

Apple‑optimized MLX format: 19.17 tokens/s

70 B DeepSeek R1 (8‑bit) on a single M3 Ultra: 11.3 tokens/s vs. 10.69 tokens/s on an M4 Max MacBook Pro

Model Architecture Explanation

DeepSeek R1 splits the 671 B parameter model into an “expert mixture” that dynamically activates only a subset of parameters for each query. This effectively reduces the active model size to roughly 30 B during inference, which explains why the larger model can run faster than a dense 70 B version.

Future Outlook

Rumors indicate Apple may introduce an M4 Ultra at the upcoming WWDC, which could further improve local large‑model inference performance.

Reference Links

https://x.com/alexocheema/status/1899604613135028716

https://www.bilibili.com/video/BV1nkRnYTEWx/

Code example

往
期
推
荐
1、
重磅开源！基于 Spring Boot 的企业级 DeepSeek 知识库与智能对话方案
2、
MySQL 升级后查询性能跳水，排序竟成“罪魁祸首”？
3、
MyBatis-Plus 开发提速器：mybatis-plus-generator-ui
4、
美团一面：Spring Cloud 远程调用为啥要采用 HTTP，而不是 RPC？
5、
主管发了18万年终奖，因为开会经常顶撞领导，钱到账后就提了辞呈，结果领导说：以前你顶撞我都是为了工作，我已经很忍让了，也是为你好
点
分
享
点
收
藏
点
点
赞
点在看

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

DeepSeek AI inference LLM Benchmark M3 Ultra Mac Studio

Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.