Artificial Intelligence 8 min read

Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture

The article analyzes Mercury Coder, a diffusion‑based language model that generates text and code in parallel, compares its speed and quality against traditional autoregressive LLMs like GPT‑4o‑mini using a ball‑collision benchmark, and discusses the underlying score‑entropy training, current limitations, and future multimodal potential.

AI Frontier Lectures

Mar 17, 2025

Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture

Background: Traditional Autoregressive LLMs

Conventional large language models generate output token by token in a left‑to‑right, autoregressive fashion. This sequential processing creates a major speed bottleneck because each token must wait for the previous one to be predicted.

Mercury’s Diffusion‑Based Approach

Mercury, developed by Inception Labs, replaces the autoregressive architecture with a diffusion model originally used for image, video, and audio generation (e.g., Midjourney, DALL‑E, Sora). The model starts from a rough answer estimate and iteratively refines it, allowing a neural network to modify many tokens simultaneously.

Training Method: Score Entropy

The October 2023 research paper by Inception Labs’ co‑founders introduces a “score entropy” loss that extends continuous‑space score‑matching to discrete token data. During training, random noise (masking) is added to tokens in multiple steps; the model learns the ratio of the probability that token y is correct versus token x. At inference, the model starts from a fully masked state and gradually removes the noise, guided by the learned transition ratios.

Benchmark Experiment

To evaluate Mercury Coder, the authors used a classic ball‑collision problem: generate HTML for a hexagon containing a particle that bounces off the edges, changing the edge color on each collision. The prompt was:

Write HTML code where the page center is a regular hexagon, a particle with an initial velocity moves inside, bounces on the hexagon boundary, and each bounce changes the boundary color randomly.

Mercury Coder’s output was compared with GPT‑4o‑mini. Mercury produced a working but less polished implementation; GPT‑4o‑mini handled the collision physics and hexagon rendering slightly better, though it lacked the color‑changing effect.

Performance Trade‑offs

Mercury Coder can be 5–10× faster than traditional LLMs because many tokens are updated in parallel, leading to higher GPU utilization and an estimated ten‑fold reduction in operating cost. However, each inference step of a diffusion model is computationally heavier, which can offset the speed advantage and makes the actual cost savings uncertain.

Future Prospects and Community Feedback

The diffusion paradigm offers finer control similar to image‑generation sketch guidance, enabling precise steering of output. It also opens the door to unified multimodal models that handle text, code, images, video, and audio with shared knowledge. Andrej Karpathy remarked on social media that such models could exhibit entirely new “psychological” characteristics and strengths.

Key Researchers and Publications

Stefano Ermon’s team at Inception Labs pioneered the application of diffusion to discrete data, publishing the score‑entropy paper in October 2023 and receiving the Best Paper award at ICML 2024. Their follow‑up work, SEDD (Score‑Entropy Diffusion for Discrete data), demonstrated 25–75 % perplexity reduction on language modeling benchmarks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models benchmark Diffusion Models Text Generation AI performance Mercury

Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.