DeepSeek’s New Multimodal Paradigm Compresses Images 7,056× and Outperforms GPT‑4/Claude in Visual Reasoning
DeepSeek’s multimodal model, built on the V4‑Flash architecture and a visual‑primitive reasoning approach, compresses a full‑resolution image by 7,056 times, achieves comparable or superior performance to GPT‑5.4 and Claude‑Sonnet‑4.6 on counting and spatial‑reasoning benchmarks, and does so with dramatically lower compute.
