Artificial Intelligence 10 min read

Blurry Images Create a ‘Comfort Zone’ for Jailbreaking Multimodal LLMs

A new study from Westlake University shows that when harmful text is rendered as low‑resolution, blurry, or noisy images, multimodal large language models become significantly easier to jailbreak despite still recognizing the text, revealing a U‑shaped risk curve and a simple mitigation that decouples OCR from safety checks.

Machine Learning Algorithms & Natural Language Processing

Jun 15, 2026

Blurry Images Create a ‘Comfort Zone’ for Jailbreaking Multimodal LLMs

Attack Comfort Zone (ACZ) in Multimodal LLMs

Visual degradation—low DPI, blur, noise, distortion, or occlusion—creates an Attack Comfort Zone (ACZ) where multimodal large language models (MLLMs) remain highly readable (OCR accuracy >93 %) but become dramatically more vulnerable to jailbreak attacks.

Experimental Setup

770 deduplicated harmful text queries were rendered into images with varying DPI.

Models evaluated: GPT‑4.1, Claude Sonnet 4.5, Doubao Seed 1.6, Qwen3‑VL, GLM‑4.5V, Intern‑S1.

Metrics: character‑level OCR accuracy, word‑level OCR accuracy, attack success rate (ASR).

Key Findings

ASR follows a non‑monotonic, inverted‑U curve across DPI: in the ACZ range OCR stays above 93 % while ASR spikes.

Example: Qwen3‑VL‑32B‑Thinking ASR rises from 36.7 % on clean text to 86.2 % on ACZ images; OCR remains 95.4 % (character) and 93.2 % (word).

Chinese prompts show the same pattern: Doubao Seed 1.6 ASR increases from 16.7 % at 300 DPI to 70.3 % in the ACZ range.

Additional degradations—blur, geometric distortion, interference lines, mosaic, noise, and occlusion—produce similar risk spikes, confirming that the phenomenon is not limited to low resolution.

Visual Cognitive Overload Hypothesis

The authors propose that images just clear enough to be readable require extra computational effort for character recognition. This “visual cognitive overload” delays or weakens shallow‑layer safety checks, allowing harmful content to surface only in deeper layers.

Layer‑wise safety probes show harmful features appear early for clean images but are suppressed in shallow layers for ACZ inputs, emerging later in deeper layers. t‑SNE analysis demonstrates that ACZ samples lie close to high‑fidelity samples in representation space, indicating they are treated as valid visual signals rather than out‑of‑distribution noise.

Structured Cognitive Offloading Defense

A simple mitigation pipeline decouples visual recognition from safety judgment:

Transcription : OCR the image to pure text.

Safety Evaluation : Apply the model’s text‑based safety filter to the transcript.

Response : Generate the final answer based on the safety outcome.

Applying this pipeline to Qwen3‑VL reduces ACZ ASR from ~67 % to 4 % without increasing false‑rejects on a clean OCR subset. The trade‑off is a ~102 % increase in average output length.

Implications

Multimodal safety alignment depends on input modality and visual quality, not solely on semantic understanding. Visual‑text compression techniques that push models into the ACZ may incur hidden security costs.

Paper: Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

Full paper: https://arxiv.org/pdf/2605.07250

Code and data: https://github.com/Westlake-AGI-Lab/ACZ-Jailbreak

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

OCR multimodal LLM jailbreak safety alignment structured cognitive offloading visual degradation

Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.