Critical CVE-2026-7482 'Bleeding Llama' in Ollama: Why You Must Upgrade Now
Ollama versions before 0.17.1 suffer a CVSS 9.1 heap out‑of‑bounds read vulnerability (CVE‑2026‑7482) that lets attackers upload malicious GGUF files, read server memory—including env vars and API keys—and exfiltrate data, affecting over 300,000 publicly exposed servers, so immediate upgrade and hardening are essential.
Ollama versions prior to 0.17.1 contain a CVSS 9.1 heap out‑of‑bounds read vulnerability in the GGUF model loader, identified as CVE‑2026‑7482 “Bleeding Llama” by Cyera researcher Dor Attias.
The flaw resides in fs/ggml/gguf.go and the WriteTo() function of server/quantization.go. During model quantization the Go unsafe package bypasses memory safety checks, and the loader does not verify that the tensor offset and size declared in a GGUF file fit within the actual file length.
Attack chain
Upload a maliciously crafted GGUF file with an excessively large tensor shape to a publicly reachable Ollama server via HTTP POST.
Trigger the vulnerability by calling /api/create, causing the out‑of‑bounds read during quantization.
Use /api/push to push the leaked memory as a model artifact to an attacker‑controlled registry.
The leaked data can include environment variables, API keys, system prompts, and other users’ conversation data. Dor Attias notes that engineers who connect Ollama to Claude Code as a local inference backend may inadvertently expose all tool inputs and outputs.
Impact
The Hacker News cites Cyera data indicating that more than 300,000 Ollama servers are exposed to the public Internet. Ollama’s GitHub repository has over 171 k stars and 16 k forks, underscoring the large attack surface.
Mitigation
Upgrade immediately to version 0.17.1 (release notes: https://github.com/ollama/ollama/releases/tag/v0.17.1).
Do not expose Ollama directly to the Internet; place it behind a firewall.
Audit existing instances for unintended public exposure.
Add an authentication proxy or API gateway, as Ollama lacks built‑in authentication.
Review instance logs for suspicious /api/create or /api/push requests.
In short, the “Bleeding Llama” vulnerability demonstrates the concrete risk of running Ollama without authentication: an attacker can read any secret data held in the process heap, not merely consume GPU resources.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
