Can Multimodal LLMs Beat Humans in Real Web Search? GPT‑5.2 Scores Only 36% on New BrowseComp‑V3 Benchmark

A new multimodal browsing benchmark, BrowseComp‑V3, reveals that human experts achieve a 68.03% success rate while the strongest closed‑source model, GPT‑5.2, manages just 36.17%, highlighting current limitations in deep web‑scale visual‑text reasoning and the critical role of tool‑augmented agents.

GPT-5.2OmniSeekerhuman performance

0 likes · 12 min read