DiffusionGemma Boosts Text Generation Speed Up to 4× with Discrete Diffusion
Google’s open‑source DiffusionGemma model leverages a 26‑billion‑parameter Mixture‑of‑Experts architecture and discrete diffusion decoding to generate whole text blocks, achieving up to four times faster generation—over 1100 tokens/s on an NVIDIA H100 and 700 tokens/s on an RTX 5090—while activating only 3.8 billion parameters during inference.
