Critical CVE-2026-7482 'Bleeding Llama' in Ollama: Why You Must Upgrade Now

Ollama versions before 0.17.1 suffer a CVSS 9.1 heap out‑of‑bounds read vulnerability (CVE‑2026‑7482) that lets attackers upload malicious GGUF files, read server memory—including env vars and API keys—and exfiltrate data, affecting over 300,000 publicly exposed servers, so immediate upgrade and hardening are essential.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
Critical CVE-2026-7482 'Bleeding Llama' in Ollama: Why You Must Upgrade Now

Ollama versions prior to 0.17.1 contain a CVSS 9.1 heap out‑of‑bounds read vulnerability in the GGUF model loader, identified as CVE‑2026‑7482 “Bleeding Llama” by Cyera researcher Dor Attias.

The flaw resides in fs/ggml/gguf.go and the WriteTo() function of server/quantization.go. During model quantization the Go unsafe package bypasses memory safety checks, and the loader does not verify that the tensor offset and size declared in a GGUF file fit within the actual file length.

Attack chain

Bleeding Llama attack chain
Bleeding Llama attack chain

Upload a maliciously crafted GGUF file with an excessively large tensor shape to a publicly reachable Ollama server via HTTP POST.

Trigger the vulnerability by calling /api/create, causing the out‑of‑bounds read during quantization.

Use /api/push to push the leaked memory as a model artifact to an attacker‑controlled registry.

The leaked data can include environment variables, API keys, system prompts, and other users’ conversation data. Dor Attias notes that engineers who connect Ollama to Claude Code as a local inference backend may inadvertently expose all tool inputs and outputs.

Impact

The Hacker News cites Cyera data indicating that more than 300,000 Ollama servers are exposed to the public Internet. Ollama’s GitHub repository has over 171 k stars and 16 k forks, underscoring the large attack surface.

Mitigation

Upgrade immediately to version 0.17.1 (release notes: https://github.com/ollama/ollama/releases/tag/v0.17.1).

Do not expose Ollama directly to the Internet; place it behind a firewall.

Audit existing instances for unintended public exposure.

Add an authentication proxy or API gateway, as Ollama lacks built‑in authentication.

Review instance logs for suspicious /api/create or /api/push requests.

In short, the “Bleeding Llama” vulnerability demonstrates the concrete risk of running Ollama without authentication: an attacker can read any secret data held in the process heap, not merely consume GPU resources.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMinformation securityOllamaAPI vulnerabilityBleeding LlamaCVE-2026-7482heap out-of-bounds
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.