Can Apple’s M3 Ultra Mac Studio Run Full‑Scale DeepSeek R1 at 11 Tokens/s?
Early adopters benchmarked the M3 Ultra‑powered Mac Studio running the 671‑billion‑parameter DeepSeek R1 model, achieving around 11 tokens per second in practice (up to 20 tokens/s theoretically), and compared its performance and cost against GPU‑based solutions and the newer M4 Max hardware.
Test Configuration
M3 Ultra Mac Studio: 32‑core CPU, 80‑core GPU, 32‑core Neural Engine
512 GB unified memory
1 TB SSD
Two units interconnected via Thunderbolt 5 (80 Gbps) using an EXO Labs device
Performance Results
Running the full 671 B DeepSeek R1 model (8‑bit quantization) on the two‑node setup achieved 11 tokens/second in real‑world inference, with a measured theoretical ceiling of about 20 tokens/second .
Quantization impact:
8‑bit: 9 – 21 tokens/s (observed range across community tests)
4‑bit: 16 – 18 tokens/s
Cost Comparison
Each configured Mac Studio costs ¥74,249, so the dual‑node system is roughly ¥150,000. Prior to the M3 Ultra, achieving comparable local inference required six to seven NVIDIA A100 GPUs, costing close to ¥1 million.
Community Benchmarks
Ollama GGUF format: 15.78 tokens/s
Apple‑optimized MLX format: 19.17 tokens/s
70 B DeepSeek R1 (8‑bit) on a single M3 Ultra: 11.3 tokens/s vs. 10.69 tokens/s on an M4 Max MacBook Pro
Model Architecture Explanation
DeepSeek R1 splits the 671 B parameter model into an “expert mixture” that dynamically activates only a subset of parameters for each query. This effectively reduces the active model size to roughly 30 B during inference, which explains why the larger model can run faster than a dense 70 B version.
Future Outlook
Rumors indicate Apple may introduce an M4 Ultra at the upcoming WWDC, which could further improve local large‑model inference performance.
Reference Links
https://x.com/alexocheema/status/1899604613135028716
https://www.bilibili.com/video/BV1nkRnYTEWx/
Code example
往
期
推
荐
1、
重磅开源!基于 Spring Boot 的企业级 DeepSeek 知识库与智能对话方案
2、
MySQL 升级后查询性能跳水,排序竟成“罪魁祸首”?
3、
MyBatis-Plus 开发提速器:mybatis-plus-generator-ui
4、
美团一面:Spring Cloud 远程调用为啥要采用 HTTP,而不是 RPC?
5、
主管发了18万年终奖,因为开会经常顶撞领导,钱到账后就提了辞呈,结果领导说:以前你顶撞我都是为了工作,我已经很忍让了,也是为你好
点
分
享
点
收
藏
点
点
赞
点在看Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
