Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture
The article analyzes Mercury Coder, a diffusion‑based language model that generates text and code in parallel, compares its speed and quality against traditional autoregressive LLMs like GPT‑4o‑mini using a ball‑collision benchmark, and discusses the underlying score‑entropy training, current limitations, and future multimodal potential.
Background: Traditional Autoregressive LLMs
Conventional large language models generate output token by token in a left‑to‑right, autoregressive fashion. This sequential processing creates a major speed bottleneck because each token must wait for the previous one to be predicted.
Mercury’s Diffusion‑Based Approach
Mercury, developed by Inception Labs, replaces the autoregressive architecture with a diffusion model originally used for image, video, and audio generation (e.g., Midjourney, DALL‑E, Sora). The model starts from a rough answer estimate and iteratively refines it, allowing a neural network to modify many tokens simultaneously.
Training Method: Score Entropy
The October 2023 research paper by Inception Labs’ co‑founders introduces a “score entropy” loss that extends continuous‑space score‑matching to discrete token data. During training, random noise (masking) is added to tokens in multiple steps; the model learns the ratio of the probability that token y is correct versus token x. At inference, the model starts from a fully masked state and gradually removes the noise, guided by the learned transition ratios.
Benchmark Experiment
To evaluate Mercury Coder, the authors used a classic ball‑collision problem: generate HTML for a hexagon containing a particle that bounces off the edges, changing the edge color on each collision. The prompt was:
Write HTML code where the page center is a regular hexagon, a particle with an initial velocity moves inside, bounces on the hexagon boundary, and each bounce changes the boundary color randomly.Mercury Coder’s output was compared with GPT‑4o‑mini. Mercury produced a working but less polished implementation; GPT‑4o‑mini handled the collision physics and hexagon rendering slightly better, though it lacked the color‑changing effect.
Performance Trade‑offs
Mercury Coder can be 5–10× faster than traditional LLMs because many tokens are updated in parallel, leading to higher GPU utilization and an estimated ten‑fold reduction in operating cost. However, each inference step of a diffusion model is computationally heavier, which can offset the speed advantage and makes the actual cost savings uncertain.
Future Prospects and Community Feedback
The diffusion paradigm offers finer control similar to image‑generation sketch guidance, enabling precise steering of output. It also opens the door to unified multimodal models that handle text, code, images, video, and audio with shared knowledge. Andrej Karpathy remarked on social media that such models could exhibit entirely new “psychological” characteristics and strengths.
Key Researchers and Publications
Stefano Ermon’s team at Inception Labs pioneered the application of diffusion to discrete data, publishing the score‑entropy paper in October 2023 and receiving the Best Paper award at ICML 2024. Their follow‑up work, SEDD (Score‑Entropy Diffusion for Discrete data), demonstrated 25–75 % perplexity reduction on language modeling benchmarks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
