Machine Learning Algorithms & Natural Language Processing
May 3, 2026 · Artificial Intelligence
Running a 400B Mixture‑of‑Experts LLM on iPhone 17 Pro: Inside Flash‑MoE
The article details how the open‑source Flash‑MoE engine streams a 400‑billion‑parameter Mixture‑of‑Experts language model on an iPhone 17 Pro, achieving interactive‑level token throughput by eliminating Python dependencies, crafting a custom Metal pipeline, and streaming weights directly from SSD.
Apple SiliconFlash-MoEGCD
0 likes · 7 min read
