Tagged articles
1 articles
Page 1 of 1
Data Party THU
Data Party THU
Jun 6, 2026 · Artificial Intelligence

How a 400B MoE Model Runs on iPhone 17 Pro with Flash‑MoE

The article details how the open‑source Flash‑MoE engine enables the 400B‑parameter Qwen3.5‑397B‑A17B mixture‑of‑experts model to run on an iPhone 17 Pro, achieving about 0.6 tokens per second through a custom Metal pipeline, GCD‑driven SSD streaming, and aggressive caching strategies.

400BFlash-MoELLM inference
0 likes · 6 min read
How a 400B MoE Model Runs on iPhone 17 Pro with Flash‑MoE