Artificial Intelligence 11 min read

Google Open‑Sources Gemma 4, Outperforming a 13×‑Larger Qwen 3.5

Google DeepMind released the open‑source Gemma 4 family—four model sizes ranging from 2 B to 31 B parameters, supporting text, images, video and audio, with up to 256 k token context, Apache 2.0 licensing, and benchmark results that place it on par with the 397 B Qwen 3.5 despite being far smaller.

Machine Heart

Apr 3, 2026

Google Open‑Sources Gemma 4, Outperforming a 13×‑Larger Qwen 3.5

On Thursday evening Google DeepMind open‑sourced the Gemma 4 series, the current strongest open‑source model family. Built on the same research as Gemini 3, the models rank third on the Arena AI leaderboard, surpassing models that are up to 20 times larger in parameter count, and are released under an Apache 2.0 license that permits unrestricted commercial use.

Model Variants and Architecture

Gemma 4 includes four sizes: E2B, E4B, 26B A4B and 31B. The smallest models (E2B/E4B) target edge devices such as phones and tablets and were jointly optimized with Qualcomm and MediaTek. The larger models use a hybrid architecture that combines dense layers with a Mixture‑of‑Experts (MoE) design, allowing high‑capacity inference while keeping runtime efficient. The 31B model can run full‑precision inference on a single 80 GB H100 GPU, achieving capability comparable to the 397 B Qwen 3.5.

Key Features

High‑capability inference with configurable reasoning modes.

Extended multimodal support: text, variable‑aspect‑ratio images (all models), video and native audio (E2B/E4B).

Scalable architecture: dense and MoE variants for flexible deployment.

Device‑optimized designs: small models use Per‑Layer Embeddings (PLE) to maximise parameter efficiency on edge hardware.

Large context windows: 128 k tokens for E2B/E4B, 256 k tokens for 26B A4B and 31B.

Native system‑role prompting for more structured dialogues.

Enhanced encoding and agent capabilities, including native function calling.

Training Data and Pre‑processing

The pre‑training corpus is a massive, multilingual collection (over 140 languages) that includes web documents, source code, mathematics text, and images, with a cut‑off date of January 2025. Data cleaning pipelines apply strict CSAM filtering, automated removal of personal and other sensitive information, and additional quality‑based filters.

Memory and Quantisation

Gemma 4 models are available in 16‑bit default precision and can be quantised to lower‑bit formats. Smaller models (E2B/E4B) use PLE to keep the static weight footprint low, while the 26B A4B MoE model activates only 4 B parameters per token during generation but still requires loading all 26 B parameters into memory, leading to higher baseline VRAM usage.

Table 1 (omitted) lists the approximate GPU/TPU memory needed for each size and quantisation level; these figures account only for static weights and do not include runtime overhead such as KV‑cache for long contexts.

Benchmark Results

Extensive evaluations on a wide range of text‑generation benchmarks (instruction‑tuned models) show Gemma 4 achieving top‑tier scores across tasks such as reasoning, coding, multilingual understanding, and multimodal perception. Sample outputs demonstrate GUI element detection, HTML page reconstruction, and multi‑modal reasoning.

Core Capabilities

Step‑by‑step reasoning before answering.

Long‑context handling up to 256 k tokens.

Image understanding: object detection, OCR, document parsing, chart analysis, and visual grounding.

Video understanding via frame‑sequence processing.

Interleaved multimodal inputs within a single prompt.

Native function calling for tool use and autonomous agents.

Code generation, completion and correction.

Support for over 35 languages (140+ languages in pre‑training).

Audio capabilities (ASR and speech‑to‑text translation) in the E2B/E4B models.

Overall, Gemma 4 combines dense and MoE architectures, extensive multimodal support, large context windows, and open licensing to provide a versatile, high‑performance foundation model suitable for everything from edge devices to high‑end servers.

benchmark open-source AI multimodal LLM Apache 2.0 Google DeepMind MoE architecture Gemma 4

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.