AI Frontier Lectures
Apr 12, 2025 · Artificial Intelligence
How ByteDance Scales Attn/MoE: Cost Models, Mesh Communication, and Network Hacks
The article analyzes ByteDance's MegaScale‑Infer paper, detailing micro‑batching, M:N Attn‑MoE ratios, cost‑driven constraint search, communication redesign with Mesh All‑2‑All, network latency challenges, and innovative NIC and routing solutions for large‑scale mixture‑of‑experts inference.
AI inferenceByteDanceCost Optimization
0 likes · 7 min read
