Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Jeff Dean’s New Paper Shows Elastic Large‑Scale Distributed Pre‑Training Is Now Feasible

Decoupled DiLoCo, a new distributed training framework introduced by Jeff Dean and colleagues, enables resilient large‑scale AI pre‑training across heterogeneous hardware by decoupling learners, using lightweight syncers, adaptive quorum, and balanced tensor fragmentation, dramatically improving goodput and reducing bandwidth while preserving model quality.

Bandwidth ReductionDecoupled DiLoCoDistributed Training
0 likes · 10 min read
Jeff Dean’s New Paper Shows Elastic Large‑Scale Distributed Pre‑Training Is Now Feasible
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 11, 2024 · Artificial Intelligence

Why Separate Prefill and Decode? A Deep Dive into DistServe’s Split Inference Architecture

This article explores the two‑stage LLM inference pipeline, introduces TTFT and TPOT metrics, explains the motivation for prefilling‑decoding separation, presents experimental comparisons between split and merged architectures, and details optimization techniques and parallel‑strategy modeling for DistServe.

DistServeGoodputLLM inference
0 likes · 28 min read
Why Separate Prefill and Decode? A Deep Dive into DistServe’s Split Inference Architecture