When Blurry Images Create an Attack Comfort Zone for Multimodal LLMs

Westlake University's AGI Lab shows that when harmful text is rendered as low‑resolution, blurry or noisy images, multimodal large language models can still read the content but their safety filters fail, creating an 'attack comfort zone' that dramatically raises jailbreak success rates across several models.

OCRjailbreakmultimodal LLM

0 likes · 9 min read

When Blurry Images Create an Attack Comfort Zone for Multimodal LLMs