Tagged articles
3 articles
Page 1 of 1
HyperAI Super Neural
HyperAI Super Neural
Jun 12, 2026 · Artificial Intelligence

DiffusionGemma Boosts Text Generation Speed Up to 4× with Discrete Diffusion

Google’s open‑source DiffusionGemma model leverages a 26‑billion‑parameter Mixture‑of‑Experts architecture and discrete diffusion decoding to generate whole text blocks, achieving up to four times faster generation—over 1100 tokens/s on an NVIDIA H100 and 700 tokens/s on an RTX 5090—while activating only 3.8 billion parameters during inference.

DiffusionGemmaDiscrete DiffusionGPU Acceleration
0 likes · 4 min read
DiffusionGemma Boosts Text Generation Speed Up to 4× with Discrete Diffusion
Old Zhang's AI Learning
Old Zhang's AI Learning
Jun 11, 2026 · Artificial Intelligence

Google’s 26B DiffusionGemma Model Delivers 1000+ Tokens/s – Runs on a 4090

DiffusionGemma, Google DeepMind’s 26B MoE model that generates 256‑token blocks via diffusion, achieves over 1000 tokens per second on H100/H200 GPUs, offers FP8 and NVFP4 quantized versions with near‑lossless accuracy, and can be deployed locally with vLLM Docker images, though it incurs higher first‑token latency and limited concurrency.

26B modelDiffusionGemmaFP8 quantization
0 likes · 10 min read
Google’s 26B DiffusionGemma Model Delivers 1000+ Tokens/s – Runs on a 4090
Machine Heart
Machine Heart
Jun 11, 2026 · Artificial Intelligence

Google Releases DiffusionGemma 26B MoE—Text Generation Up to 4× Faster

DiffusionGemma, Google's new 26‑billion‑parameter Mixture‑of‑Experts model, replaces token‑by‑token autoregression with a diffusion‑style output head that generates whole text blocks, delivering up to four‑fold speed gains on consumer GPUs while offering bidirectional attention and self‑correction, albeit with lower quality than standard Gemma 4.

DiffusionGemmaGPU AccelerationMixture of Experts
0 likes · 6 min read
Google Releases DiffusionGemma 26B MoE—Text Generation Up to 4× Faster